When a literal ] appears in square brackets in a regular expression, base R functions find nothing within the range unless perl=TRUE (R for Data Science could mention this) #1629
Section 15.4.3 in R for Data Science (https://r4ds.hadley.nz/regexps.html#character-classes) says this about regular expressions:
\ escapes special characters, so [\^\-\]] matches ^, -, or ].
But this specific example does not seem to be true when using base R, unless perl=TRUE is chosen (I am using R 4.2.1).
The general issue of slight differences between base R and stringr is noted in section 15.7.2, but perhaps this particular quirk is worth mentioning in 15.4.3 as the example contains one of these differences.
For example:
grepl("[\\^\\-\\]]", "]")
returns FALSE.
And:
grepl("[\\^\\-\\]]", "^-]")
also returns FALSE, indicating that nothing in the range is found in the string.
But only the ] symbol appears to cause this. So:
grepl("[\\^\\-\\[]", "^-]")
returns TRUE, seemingly because the ] is not there (in this example it has been replaced by [ but it could just as well be replaced by nothing).
This issue seems to go away entirely when perl=TRUE is used, so:
grepl("[\\^\\-\\]]", "]", perl=TRUE)
and
grepl("[\\^\\-\\]]", "-", perl=TRUE)
both return TRUE.
Perhaps there could to be a note in the book to reflect this, or perhaps it is an issue with base R or the TRE engine.
Section 15.4.3 in R for Data Science (https://r4ds.hadley.nz/regexps.html#character-classes) says this about regular expressions:
\
escapes special characters, so[\^\-\]]
matches^
,-
, or]
. But this specific example does not seem to be true when using base R, unless perl=TRUE is chosen (I am using R 4.2.1). The general issue of slight differences between base R and stringr is noted in section 15.7.2, but perhaps this particular quirk is worth mentioning in 15.4.3 as the example contains one of these differences.For example:
grepl("[\\^\\-\\]]", "]")
returns FALSE. And:grepl("[\\^\\-\\]]", "^-]")
also returns FALSE, indicating that nothing in the range is found in the string. But only the ] symbol appears to cause this. So:grepl("[\\^\\-\\[]", "^-]")
returns TRUE, seemingly because the ] is not there (in this example it has been replaced by [ but it could just as well be replaced by nothing).This issue seems to go away entirely when perl=TRUE is used, so:
grepl("[\\^\\-\\]]", "]", perl=TRUE)
andgrepl("[\\^\\-\\]]", "-", perl=TRUE)
both return TRUE.Perhaps there could to be a note in the book to reflect this, or perhaps it is an issue with base R or the TRE engine.