Closed wincowgerDEV closed 2 years ago
Thanks, Win!
This may be related to https://github.com/r-hyperspec/hyperSpec/issues/80 and https://github.com/r-hyperspec/hyperSpec/pull/81. I'll try to fix it by specifying the encoding passed to hyperSpec.
Nice find on that! You're a wizard.
On Mon, Jul 4, 2022, 10:33 AM Zacharias Steinmetz @.***> wrote:
Thanks, Win!
This may be related to r-hyperspec/hyperSpec#81 https://github.com/r-hyperspec/hyperSpec/pull/81. I'll try to fix it by specifying the encoding passed to hyperSpec.
— Reply to this email directly, view it on GitHub https://github.com/wincowgerDEV/OpenSpecy-package/issues/114#issuecomment-1174027194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJU5WXCUE35I2BWO5PP3VSMN6RANCNFSM5Z7TGX2Q . You are receiving this because you authored the thread.Message ID: @.***>
It turns out that your packages, during their tests, pass invalid characters to regular expression operations. Invalid character here means a sequence of bytes that doesn't match a character in the encoding the string should have. Therefore, the regular expression operations may not (and in some cases can not) proceed correctly.
Until now, R used to silently escape such invalid characters using "", where NN is a hexadecimal number, but then the results of such
operations could be not quite as intended. R-devel has been improved to
detect these cases and report an error or warning, and this triggers
during package checks of your packages, so they will now start failing
their tests to signal the error.
More information is available in a blog post:
https://developer.r-project.org/Blog/public/2022/06/27/why-to-avoid-%5Cx-in-regular-expressions/index.html
even though in your cases (almost all) it seems to me following a quick check that the invalid string is not the regular expression itself, but one of the inputs.
Please fix your packages to ensure that the strings are valid. Very likely often the problem is that the data you process are not properly read into R - not converted to the current encoding (or to UTF-8); they are expected to be in the current encoding, then, but they are not (the current encoding may be different on different systems, though UTF-8 is most common with recent R on recent systems). I've seen this happen also in cases when the data were assumed to be ASCII, but in fact contained some extended ASCII characters.