Open dkg opened 1 year ago
Font issue, I'd say
$ pbpaste | echars *** Miscellaneous Symbols (Common) ☃: U+2603 1 SNOWMAN ⛄: U+26C4 1 SNOWMAN WITHOUT SNOW
They are in the same group in Unicode, but of course fonts don't pick up whole groups. (And my browser is broken and shows both the same.)
Is this an issue with the pdf that comes out of xml2rfc, or the pdfized pdf-rendering-of-the-htmilzed-text that comes out of the datatracker? If the former, lets move this to the xml2rfc repo?
PDF generated by xml2rfc
shows both snowmen. But from two different font groups.
Probably need to include extra font on xml2rfc
but I think we can tackle that if this gets to RFC-to-be stage.
in case it wasn't clear, i don't intend draft-dkg-rfcediting-non-ascii-ietf-tooling to ever become an RFC! that's just a test harness so i can push back on some of the FUD i was hearing about how non-ASCII text might be broken.
I'm unaware of any RFC use case that would need either SNOWMAN character, but the demonstration is intended to highlight problems and identify structural issues in unicode coverage and transmission before some RFC really does try to use a symbol that isn't well-supported in one of the output formats.
The problem pdf i found was generated by the datatracker -- i don't know what toolchain was used. When generating the file locally with xml2rfc
i do actually see both glyphs. It's possible that this is due to my having certain fonts available locally that are not available on the VM hosting the datatracker, but i don't know.
thanks for looking into it, i really appreciate all the work that has been done on making the RFC series capable of including robust, modern documents with a stable and expansive character set.
Thanks @dkg - I understand what you're doing - and what you provide above is enough for me to know which invocation of weasyprint to study. It's the one in the xml2fc environment used by the datatracker when it generates formats from xml submissions, which may well not have the right font set installed - we'll go look.
(for the record, this I-D has been much more useful than just identifying the SNOWMAN weirdness -- it demonstrated that use cases i heard active concerns about during IETF 117 (cyrillic text, mathematical symbols) do work fine. what you see in my reports are the corner cases where things remain broken -- but the real takeaway from this for me is that the use cases people actually care about are not broken. thanks for all the work that has gone into this!)
In the web view, on my machine, the "snowman without snow" comes from the "Apple Color Emoji" font, and the "snowman with snow" comes from the "Menlo" font.
I guess that's because the CSS says font-family: "Noto Sans Mono", SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace
. (But I don't understand where "Apple Color Emoji" comes from...)
That same CSS is passed into Weasyprint when making the PDF, and these are the fonts that end up in the PDF:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
IVBPZU+Noto-Sans-Mono CID TrueType Identity-H yes yes yes 64 0
JUCHNC+Noto-Sans-Mono-Bold CID TrueType Identity-H yes yes yes 68 0
TNCRMY+DejaVu-Sans-Mono CID TrueType Identity-H yes yes yes 72 0
IZYCUH+DejaVu-Sans CID TrueType Identity-H yes yes yes 76 0
Not sure where/why "DejaVu" is picked up from, but I guess it doesn't have the character.
Since we want to use Noto, should we add https://fonts.google.com/noto/specimen/Noto+Emoji?
Describe the issue
draft-dkg-rfcediting-non-ascii-ietf-tooling is a test draft that contains multiple non-ascii characters. they all render just fine in the text and html variants, but the pdf variant fails to include ⛄ (U+26C4 SNOWMAN WITHOUT SNOW). it renders ☃ (U+2603 SNOWMAN) with no problem, though. Maybe this has something to do with codepoint coverage of the default fonts.
Code of Conduct