Closed cderv closed 3 years ago
Have you tried running the UCRT build of R, described here: https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/howto.html ? The general problem on Windows has been that some R functions convert strings to the native encoding; if chars aren't representable there (or the function thinks they aren't) you get the escapes instead. The new build is an attempt to make UTF-8 the native encoding, so this problem will go away.
@cderv can you check that #112 fixes the problem? I think it should; I just need to work out the right hack to get this working on 3.6 and lower.
@dmurdoch fwiw, my expectation is that we can get this working on non-UCRT builds of windows, just with a little more work.
@hadley With installing from #112 I don't get any error now.
downlit::highlight("# é\n1:5")
#> [1] "<span class='c'># é</span>\n<span class='m'>1</span><span class='o'>:</span><span class='m'>5</span>"
packageVersion("downlit")
#> [1] '0.2.9000.9001'
I'll see if this is fixed for bookdown too of if this is something else.
This was first reported in https://github.com/rstudio/bookdown/issues/1260 by a Chinese user having chinese character in code chunks.
I am opening this issue to track it in the right place and help to solve it. And a PR already tries to fix this #112.
Also, I can reproduce on Windows French computer (also non UTF-8 by default) when using special accentuated character:
I believe the issue is that the text passed to
token_escape()
is# <e9>
and no more the original# é
. So somewhere the string is marked with incorrect encoding I think. This could be caused because any code that will be parsed is assumed to be UTF-8 https://github.com/r-lib/downlit/blob/e22f072cfeaf91fbe7c38aeabd351aaa184d36fd/R/utils.R#L49encoding = UTF-8
here means that thetext
pass toparse()
is assumed to be UTF-8, it won't do any conversion. In my case the text islatin-1
, the default on my system. Forcing UTF-8 in downlit (https://github.com/r-lib/downlit/commit/9a0d670c1b317fbd77ee62d4b5589beee034fcc3) may require a conversion to UTF-8. It solves it at least on my side.I believe the above is true if downlit directly with non UTF-8 content. In the context of R Markdown, I don't really understand why the error would happen in bookdown - R Markdown assumes UTF-8 for the file and work in UTF-8 so content passed to downlit should be UTF-8.
Also I did not look specifically at the test on windows that #112 tries to solve. It is possibly different from this one. If I can help as a Windows user, tell me.