rundel / parsermd

https://rundel.github.io/parsermd/
Other
76 stars 4 forks source link

Error with `parse_rmd_cpp()` and letters with accent in YAML #25

Closed statnmap closed 3 years ago

statnmap commented 3 years ago

Hi,

I face an issue that only arise with rhub::check(platform = "debian-clang-devel"). This does not arise with other platforms which suggests locale encoding effects.

Error in yaml.load(string, error.label = error.label, ...) : 
  Reader error: control characters are not allowed: #83 at 12

I searched into your package, and this error is triggered by parse_rmd_cpp() that transforms é in a YAML into a badly encoded value.

rmd <- "---\nauthor: \"Sébastien Rochette\"\n---\n"
ast <- parsermd:::parse_rmd_cpp(rmd, allow_incomplete = FALSE)
ast[[1]]

[1] "author: \"S�\203©bastien Rochette\""

Do you have an idea on how to circumvent the problem ? Could you try to add letters with accents like é in your test Rmd YAML to see if you can catch it ? Thank you.

rundel commented 3 years ago

Interesting, it looks like a unicode / locale issue - I'm not sure if the issue is with what the parsermd's parser is returning or what the yaml package is doing but I will dig in to it.

rundel commented 3 years ago

Seems like the underlying issue may be with YAML but it seems possible to work around it - let me know if this is working on your end.

statnmap commented 3 years ago

Yes, this seems to be good. Thank you.

It is nice to see that you directly adds this as unit test. You put a lot of effort in the robustness of this package, and that's reassuring !