skranz / RTutor

Creating interactive R Problem Sets. Automatic hints and solution checks. (Shiny or RStudio)
https://skranz.github.io/RTutor/
200 stars 57 forks source link

Question regarding mark_utf8 #35

Open MartinKies opened 3 years ago

MartinKies commented 3 years ago

In write.output.solution (create_ps.r) you have

out.txt = mark_utf8(out.txt)

I am unsure about its purpose. This line sometimes leads to errors when used before my function "fix.parser.inconsistencies" due to incompatibilities with the stringr package, e.g. regarding stringr::str_length().

Uncommenting the line fixes the error and the resulting solution looks fine to me (in particular regarding Umlauts). Perhaps the following code makes my point more clear:

fix.parser.inconsistencies("Test ü") [1] "Test ü" mark_utf8("Test ü") [1] "Test \xfc" str_length("Test ü") [1] 6 str_length("Test \xfc") [1] 6 str_length(mark_utf8("Test ü")) [1] NA Warnmeldung: In stri_length(string) : invalid UTF-8 byte sequence detected; try calling stri_enc_toutf8() fix.parser.inconsistencies(mark_utf8("Test ü")) [1] "Test �"

I am a bit wary whether uncommenting the line is the way to go, because I do not fully understand what its purpose is. Maybe I found an error in mark_utf8 itself, als str_length("Test \xfc") does work?

skranz commented 3 years ago

Hmm, honestly all this UTF-8 code was mainly try-and-error. Perhaps there was a problem that caused me to enter the line, I don't remember. If all problem sets (with special UTF-8 chars like ü) convert well you can comment it out.

Note that I usually expect all _sol.Rmd files to be saved with UTF-8 encoding if the problem arises because you saved the _sol.Rmd files in different encoding, like Windows standard encoding probably first try to change the encoding.