Bug in R update with regular expression

wincowgerDEV commented 2 years ago

It turns out that your packages, during their tests, pass invalid characters to regular expression operations. Invalid character here means a sequence of bytes that doesn't match a character in the encoding the string should have. Therefore, the regular expression operations may not (and in some cases can not) proceed correctly.

Until now, R used to silently escape such invalid characters using "", where NN is a hexadecimal number, but then the results of such operations could be not quite as intended. R-devel has been improved to detect these cases and report an error or warning, and this triggers during package checks of your packages, so they will now start failing their tests to signal the error.

More information is available in a blog post:

https://developer.r-project.org/Blog/public/2022/06/27/why-to-avoid-%5Cx-in-regular-expressions/index.html

even though in your cases (almost all) it seems to me following a quick check that the invalid string is not the regular expression itself, but one of the inputs.

Please fix your packages to ensure that the strings are valid. Very likely often the problem is that the data you process are not properly read into R - not converted to the current encoding (or to UTF-8); they are expected to be in the current encoding, then, but they are not (the current encoding may be different on different systems, though UTF-8 is most common with recent R on recent systems). I've seen this happen also in cases when the data were assumed to be ASCII, but in fact contained some extended ASCII characters.

zsteinmetz commented 2 years ago

Thanks, Win!

This may be related to https://github.com/r-hyperspec/hyperSpec/issues/80 and https://github.com/r-hyperspec/hyperSpec/pull/81. I'll try to fix it by specifying the encoding passed to hyperSpec.

wincowgerDEV commented 2 years ago

Nice find on that! You're a wizard.

On Mon, Jul 4, 2022, 10:33 AM Zacharias Steinmetz @.***> wrote:

Thanks, Win!

This may be related to r-hyperspec/hyperSpec#81 https://github.com/r-hyperspec/hyperSpec/pull/81. I'll try to fix it by specifying the encoding passed to hyperSpec.

— Reply to this email directly, view it on GitHub https://github.com/wincowgerDEV/OpenSpecy-package/issues/114#issuecomment-1174027194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJU5WXCUE35I2BWO5PP3VSMN6RANCNFSM5Z7TGX2Q . You are receiving this because you authored the thread.Message ID: @.***>

wincowgerDEV / OpenSpecy-package

Bug in R update with regular expression #114