Closed trinker closed 9 years ago
I think this is because the default regex of rm_between
is to not include the left/right bounds. This uses the following regex "(?<=\").*?(?=\")"
(S("@rm_between2", '"')
). This use of lookaheads cause the left/right bounds to not be consumed and thus allows the quotation marks to be available for: " and Danube salmon "
. This is (IMO) a bug that I will address but am unsure how yet.
@hwnd you suggested:
x <- 'Fresh or chilled Atlantic salmon "Salmo salar" and Danube salmon "Hucho hucho"'
rm_default(
x,
pattern = '(?<=")[^"]*',
extract=TRUE
)
But this gives:
## [[1]]
## [1] "Salmo salar" " and Danube salmon " "Hucho hucho" ""
``
Not:
```r
## [[1]]
## [1] "Salmo salar" "Hucho hucho"
In the case of quotes, lookarounds should be avoided because of the "in between".
One possible workaround would be:
x <- 'Fresh or chilled Atlantic salmon "Salmo salar" and Danube salmon "Hucho hucho"'
gsub('^"|"$', '',
rm_default(
x,
pattern = '"[^"]*"',
extract=TRUE)[[1]]
)
Output
## [1] "Salmo salar" "Hucho hucho"
@hwndx I incorporated your idea into rm_between
. Thanks for the help.
Determine if the following is a bug and if so how to fix:
When we expect: