ropensci / tinkr

Convert (R)Markdown files to XML, edit them, write them back as (R)Markdown
https://docs.ropensci.org/tinkr
GNU General Public License v3.0
57 stars 3 forks source link

filter out / replace special control characters #96

Closed zkamvar closed 6 months ago

zkamvar commented 1 year ago

I'm running into a fun situation where someone added a \v control character to the end of a list and I'm getting a PCDATA invalid Char value 11 error.

There is an answer in SO that describes how to detect and replace the bad character with a blank, so we can take that and add it to the pre-processing step.

NOTE: Since I can hear Nero fiddling on the outskirts of SO, here's the solution:

    illegal <- "[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]" 

    utf8_for_xml <- function(x) {

        return(gsub(illegal, "", x))

        }

    string_formatted <- utf8_for_xml(string)

I believe this is the correct way to do it since these characters shouldn't exist in the document in the first place. That being said, it's a bit frustrating to find out what these characters actually do, but I've found these resource:

maelle commented 1 year ago

:open_mouth: and :clap: