Closed bwiernik closed 3 years ago
Thanks I like this solution. Perhaps we don't need enc2native()
at all, if we can substitute the non ascii characters with escape sequences, such that we won't get the approximate ASCII analogue character?
Maybe a safer route is to do the conversion in javascript, e.g. using https://github.com/mathiasbynens/he
Handles all of the characters that come to mind and does not interfere real HTML output as far as I can tell:
Simple escape example
``` r rd <- "a\\frac{1}{3}τρσ😀发短信" rd #> [1] "a\\frac{1}{3}Ï„Ï\201σðŸ\230\200å\217‘çŸä¿¡ð\235„ž" enc2native(rd) #> [1] "a\\frac{1}{3}Ï„Ï\201σðŸ\230\200å\217‘çŸä¿¡ð\235„ž" gsub( pattern = "<(U\\+[0-9A-Fa-f]{4,8})>", replacement = "<\\1>", x = enc2native(rd) ) #> [1] "a\\frac{1}{3}Ï„Ï\201σðŸ\230\200å\217‘çŸä¿¡ð\235„ž" ``` Created on 2021-07-15 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)katex example
``` r rd <- katex::math_to_rd(katex::example_math()) rd #> [1] "\\if{html}{\\out{\n\nf(x)=s2p1e-21(sx-µ)2\n}}\n\\if{latex,text}{\n\\deqn{\nf(x)= {\\frac{1}{\\sigma\\sqrt{2\\pi}}}e^{- {\\frac {1}{2}} (\\frac {x-\\mu}{\\sigma})^2}\n}{\nf(x)= {\\frac{1}{\\sigma\\sqrt{2\\pi}}}e^{- {\\frac {1}{2}} (\\frac {x-\\mu}{\\sigma})^2}\n}}" #> attr(,"class") #> [1] "Rdtext" gsub( pattern = "<(U\\+[0-9A-Fa-f]{4,8})>", replacement = "<\\1>", x = enc2native(rd) ) #> [1] "\\if{html}{\\out{\n\nf(x)=s2p<U+200B>1<U+200B>e-21<U+200B>(sx-µ<U+200B>)2\n}}\n\\if{latex,text}{\n\\deqn{\nf(x)= {\\frac{1}{\\sigma\\sqrt{2\\pi}}}e^{- {\\frac {1}{2}} (\\frac {x-\\mu}{\\sigma})^2}\n}{\nf(x)= {\\frac{1}{\\sigma\\sqrt{2\\pi}}}e^{- {\\frac {1}{2}} (\\frac {x-\\mu}{\\sigma})^2}\n}}" #> attr(,"class") #> [1] "Rdtext" ``` Created on 2021-07-15 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)Does not handle this bit from the R documentation:
Those escapes do not have a distinctive pattern and would generally indicate a mistake in the string anyway.
Non-handled malformed UTF-16 characters
``` r enc2native("abcCloses #2