kawanet / from-xml

fromXML - Pure JavaScript XML Parser
MIT License
36 stars 14 forks source link

XML --> JSON --> XML with `<` or `>` in attribute value #6

Open ariutta opened 3 years ago

ariutta commented 3 years ago

My XML attribute contains < and >. If I send it round trip (XML --> JSON --> XML), should the final XML match the initial XML?

Code snippet:

const fromXML = require("from-xml").fromXML;
const toXML = require("to-xml").toXML;

const data = fromXML('<object label="&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/gene/59272&quot;&gt;ACE2&lt;/a&gt;" dataSource="ncbigene" identifier="59272" id="TjQ1G5W0H9Vnf-2-B7EA-1"> <mxCell style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1"> <mxGeometry x="80" y="110" width="120" height="60" as="geometry" /> </mxCell> </object>');
console.log(JSON.stringify(data));

const xml = toXML(data);
console.log(xml);

Initial XML attribute: label="&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/gene/59272&quot;&gt;ACE2&lt;/a&gt;" Final XML attribute: label="<a href=&quot;https://www.ncbi.nlm.nih.gov/gene/59272&quot;>ACE2</a>"

&lt; and &gt; are changed after a round-trip conversion, but &quot; is not changed.

Thanks!

(This XML comes from the drawing tool at diagrams.net.)

kawanet commented 3 months ago

It's not designed to perform a perfect as-is round-trip. However, the XML specification https://www.w3.org/TR/xml/#NT-AttValue looks not allow < inside attribute values.

[10]    AttValue       ::=      '"' ([^<&"] | Reference)* '"'

The from-xml library does not conform to the Well-formedness constraint since 8 years ago surprisingly!

Well-formedness constraint: No < in Attribute Values The replacement text of any entity referred to directly or indirectly in an attribute value must not contain a <.

It should be encoded to label="&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/gene/59272&quot;>ACE2&lt;/a>" at least, accoding to the specification. Note that > is allowed in attribute values, by the way. Anyway, label="&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/gene/59272&quot;&gt;ACE2&lt;/a&gt;" looks much symmetry. The change introduces a breaking change though.