kakwa / libemf2svg

Microsoft (MS) EMF to SVG conversion library
GNU General Public License v2.0
95 stars 32 forks source link

[Bug] SVG text output: CDATA container may contain invalid string #43

Open albrechtd opened 2 years ago

albrechtd commented 2 years ago

I noticed that the contents of text elements in the svg file produced by emf2svg-conv is wrapped into an extra CDATA container. E.g. the attached trivial EMF file (Sample.zip; created by LibreOffice) is converted into a SVG file containing

<text  clip-path="url(#clip-1804289383)" font-family="Liberation Sans" fill="#000000" style ="white-space:pre;" font-weight="400" text-anchor="start" x="273.9291" y="508.8484" font-size="25.4705" ><![CDATA[This is a simple test.]]></text>

My problem: I usually use GraphicsMagick (on Debian Bullseye) to convert the result to a bitmap (gm convert …), which apparently just skips the text. This actually might be a GraphicsMagick issue (as ImageMagick does render the text, but it is way slower and a little unstable, so I prefer GraphicsMagick), though. What is the reason for the extra CDATA container? Would it be possible to omit it?

Thanks, Albrecht.

albrechtd commented 2 years ago

Digging deeper into this issue, it appears that the text output is actually broken iff an EMF text item contains the CDATA termination string ]]>: running the sample from 43_sample_bug.zip through emf2svg-conv (on Debian Bookworm, Version 1.1.0+ds-3) produces an invalid SVG file. E.g. xmllint reports

Sample-orig.svg:6: parser error : Sequence ']]>' not allowed in content
art" x="10.3602" y="301.2697" font-size="25.4705" ><![CDATA[CDATA end marker ]]>
                                                                               ^

Proposed solution: Instead of writing the string verbatim in a CDATA container, escape the reserved XML characters <, & and > (escaping the latter is not strictly required) as in this patch.