r-lib / rcmdcheck

Run R CMD check from R and collect the results
https://rcmdcheck.r-lib.org
Other
115 stars 27 forks source link

read_char() chokes on input with an invalid encoding #152

Closed cpsievert closed 3 years ago

cpsievert commented 3 years ago

I came across this by running revdepcheck::cloud_report() on shiny and it seems @hadley was seeing the same thing in https://github.com/r-lib/revdepcheck/issues/288 and attempted to fix in https://github.com/r-lib/rcmdcheck/pull/133, but that was never merged. In my case, I was seeing:

Processing package results  42% (ipc)
Error in readChar(path, nchars = file.info(path)$size, ...) : 
  invalid UTF-8 input in readChar()

It turns out the error was coming from this call to rcmdcheck:::get_test_fail() which in turns calls rcmdcheck:::read_char()

I can reproduce the error locally by trying to read in {ipc}'s testthat.Rout.fail file the same way as rcmdcheck:::read_char():

path <- "ipc-testthat.Rout.fail"
readChar(path, nchars = file.info(path)$size)
#> Error in readChar(path, nchars = file.info(path)$size) : 
#>  invalid UTF-8 input in readChar()

With useBytes = TRUE, I can successfully read the file, but another downstream failure happens in rcmdcheck:::get_test_fail()'s call to nchar() (I'm guessing this is why @hadley said https://github.com/r-lib/rcmdcheck/pull/133#issuecomment-743306383)

txt <- readChar(path, nchars = file.info(path)$size, useBytes = TRUE)
nchar(txt)
#> Error in nchar(txt) : invalid multibyte string, element 1

However, if I change the encoding to UTF-8, then it works:

Encoding(tx)
#> "unknown"
nchar(enc2utf8(txt))
#> [1] 12252
gaborcsardi commented 3 years ago

@jimhester Can you please point me to the R code that runs on the cloud check container? If that uses rcmdcheck as well then maybe this is a processx bug.

Some more discussion: https://github.com/r-lib/rcmdcheck/pull/151

jimhester commented 3 years ago

It does not use rcmdcheck in the cloud code. The code is in a private repo, I can send you a link in slack.

gaborcsardi commented 3 years ago

OK, in that case it is hard to say how the output is corrupted. Maybe it is a base R bug, maybe it is something else. We can change rcmdcheck to to give a warning if the input is not in the native encoding, instead of erroring.

revdepcheck should still convert the cloud check output to the native encoding from UTF-8 (which we can assume for the the cloud check output, right?). If it is not doing that already. This would make cloud checks work on Windows for example. Or rcmdcheck could automatically try UTF-8 if the native encoding fails.

So in the end this can be worked around in rcmdcheck, I'll transfer this issue back there. :(

jimhester commented 3 years ago

Yeah I think we can assume the tests were run in UTF-8.