Running spelling::spell_check_test() fails on the crosstable package with the following error:
spelling::spell_check_package()
#>Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, :
#> Input is not proper UTF-8, indicate encoding !
#>Bytes: 0x93 0x63 0x79 0x94 [9]
I have no clue where this error can come from and the error message is unfortunately not very informative.
Would it be possible to terminate early from spelling instead of xml2 so that the path is in the error message?
Of course, if we can also have the line and the specific bad character, it would be even better!
Note that in this case, UTF8 is the default encoding in the package's DESCRIPTION and in RStudio parameters. R CMD CHECK completes without error so I guess any encoding problem is not that severe, don't you think?
In my case, it pointed to my README.md file which indeed contained special characters. I have no idea how they ended up there though, and they are far too numerous that I can correct it manually (a knitting problem from README.Rmd I guess).
EDIT2
Since this confusing problem is not that rare (#52, #58, #62), a fix might be found useful.
Here are some proposals:
1) simply use a tryCatch() on xml2::xml_ns_strip() so that we can add path in the error message
2) add a warning in the specific case of non-UTF8 characters:
text <- readLines(path, warn = FALSE, encoding = "UTF-8")
invalid = !validUTF8(text)
if(any(invalid)){
warning(message = c("The file ", path, " has non-UTF-8 characters on rows: ", paste(which(invalid), collapse=", ")))
}
3) use this trick from xfun::read_utf8() to ignore the problem (spell_check_package() will have no error):
Hi,
Running
spelling::spell_check_test()
fails on the crosstable package with the following error:I have no clue where this error can come from and the error message is unfortunately not very informative.
Would it be possible to terminate early from
spelling
instead ofxml2
so that the path is in the error message?Of course, if we can also have the line and the specific bad character, it would be even better!
Note that in this case, UTF8 is the default encoding in the package's
DESCRIPTION
and in RStudio parameters.R CMD CHECK
completes without error so I guess any encoding problem is not that severe, don't you think?REPREX
spell_check()
(I useddevtools::spell_check()
)EDIT
After more debugging, it seems to pertain to this line: https://github.com/ropensci/spelling/blob/008417f4e77a5e86c8d85e85701c59da4010e11b/R/parse-markdown.R#L24
In my case, it pointed to my
README.md
file which indeed contained special characters. I have no idea how they ended up there though, and they are far too numerous that I can correct it manually (a knitting problem fromREADME.Rmd
I guess).EDIT2
Since this confusing problem is not that rare (#52, #58, #62), a fix might be found useful.
Here are some proposals:
1) simply use a
tryCatch()
onxml2::xml_ns_strip()
so that we can addpath
in the error message 2) add a warning in the specific case of non-UTF8 characters:3) use this trick from
xfun::read_utf8()
to ignore the problem (spell_check_package()
will have no error):We can do the 3 at the same time. I can make a PR if needed.