Open santtu opened 7 years ago
The underlying problem is that system-out can contain arbitrary values. In this case the it contained raw HTML, which in turn contained CDATA, and py.test had escaped the original HTML including turning <![CDATA
into <![CDATA
.
The code makes an implicit assumption that the data does not contain any CDATA blocks after it has been decoded from XML. So the original data did not contain CDATA, just "<" plus "CDATA" and so on, and the decoded does not contain CDATA block either (it is a string, not an XML element!).
However outputting the text value as-is assumes that it does not contain a textual CDATA representation, which in this case is incorrect.
Looking at the code ... what is the rationale for patch_etree_cname
(https://github.com/miki725/xunitmerge/blob/master/xunitmerge/xmerge.py#L15)? I removed its use (from https://github.com/miki725/xunitmerge/blob/master/xunitmerge/xmerge.py#L129) and the generated XML is now valid.
I do not really understand why you would need to escape <system-out>
and others in CDATA block. If the input is valid XML input (as it should be, when generated by nosetest or py.test) then outputting it as-is will keep it as valid XML. If the original text in these blocks is incorrectly formatted (e.g. it contains unescaped tags) then it is the generating program that is creating incorrect output -- but I don't see why xunitmerge
should be fixing semantically incorrect XML input generated by other programs at all.
Is there some particular reason for patch_etree_cname
that isn't evident for me here?
Any news on that one? We are facing the exact same issue with nested CDATA that is not valid XML[1]
With the input file
and using that as the input only to generate a new file:
xunitmerge in.xml out.xml
will generateout.xml
which is not valid XML: