metanorma / metanorma-iso

Metanorma processor for ISO standards
BSD 2-Clause "Simplified" License
14 stars 5 forks source link

HTML: SVG images showing HTML escaped content #879

Closed ronaldtse closed 1 year ago

ronaldtse commented 1 year ago

Word rendering (good):

Screen Shot 2022-12-05 at 6 33 45 PM

HTML rendering (escaped content):

Screen Shot 2022-12-05 at 6 33 26 PM
opoudjis commented 1 year ago

The text in the SVG is encoded as:

<text font-family="Calibri" fill="#595959" style="white-space:pre;" text-anchor="start" x="23.0000" y="27.0000" font-size="10.0000"><![CDATA[&#171;ApplicationSchema&#187;]]></text>

CDATA is expressly NOT meant to contain HTML escapes, so I need to work out how they got there to begin with.

ronaldtse commented 1 year ago

This document is ISO 19115-3:

But I've already submitted this document to publication :wink:

opoudjis commented 1 year ago

This SVG is being generated by the emf2svg gem, which is encoding non-ascii characters as UTF-8:

<![CDATA[\xC2\xABApplicationSchema\xC2\xBB]]>

Escapes are respected in Presentation XML output, but not HTML output. That's just a matter of changing noko_html, the XML serialiser, to use UTF-8 instead of US-ASCII as its codeset; there is no particular reason to use HTML escapes in HTML any more.