Closed dwaite closed 2 years ago
[ Quoting @.***> in "[mmarkdown/mmark] Issues with UTF-8..." ]
- When encountering a file containing UTF-8 (e.g. non-US7ASCII bytes) it appears the characters are output to xml in a element, which is not part of the xml2rfc specification
sadly it is: https://xml2rfc.tools.ietf.org/xml2rfc-doc.html#name-u-new-2
I don't have a good answer, I've raised this on rfc-interest list and I think at some point utf-8 is just the new normal, for now we seem to need these crazy work around.
Note when detecting unicode xml2rfc generates a html entity, i.e. –
.
/Miek
-- Miek Gieben
[ Quoting @.***> in "Re: [mmarkdown/mmark] Issues with U..." ]
Oh I thought RFC7991 was the definitive document.
yeah... if only
7991 is a guide, the current spec isn't documented, it's what xml2rfc currently implements... I fully expect a new RFC when the dust settles, but for now this is the current state.
I would like <u>
to not insert all that extra text and just render the unicode code
point - but then again, why not go full UTF-8.
Also see: https://github.com/rfc-format/draft-iab-xml2rfc-v3-bis/issues/205
/Miek
-- Miek Gieben
The issue I encountered in this case was explanatory text that contains smart quotes.
The xml2rfc language seems to unfortunately require the use of "num" in 's format even when an ASCII alternative is given, so in this particular case the best option for me would be to add a post-commit to reject non-latin1 code points.
That said, I don't see any great options in terms of markdown tooling for unicode - the xml2rfc language seems to take all non-us7ascii text as normatively important, and automatically adding U+xxxx syntax to individual characters within a keyword or the like is sub-optimal. Seems like one needs a grouping construct for text data not already overridden for some specific formatting.
[ Quoting @.***> in "Re: [mmarkdown/mmark] Issues with U..." ]
The issue I encountered in this case was explanatory text that contains smart quotes.
I think everything is pointing into the direction of "just allow utf-8 everywhere".
We'll have to wait until we get there though.