Polarion XML fails to generate due to xmlSAX2Characters: huge text node

KwisatzHaderach commented 1 week ago

I have a somewhat large plan, taking some time and writing a lot of information into the logs. The plan finishes fine with an all pass, but then xml generation fails on `` The generated XML output is not a valid XML file. Use--verbose` argument to show the output.

The exception was caused by 1 earlier exceptions

Cause number 1:

    xmlSAX2Characters: huge text node, line 172178, column 10 (<string>, line 172178)


This is using Polarion report plugin, but will very likely end the same way for junit.

KwisatzHaderach commented 1 week ago

@seberm I guess you made the changes for jinja, can you please have a look?

seberm commented 1 week ago

Hello @KwisatzHaderach , the tmt polarion report plugin uses the junit report code internally which uses the Jinja2 and LXML to generate the final JUnit/XUnit XML file. This means the junit plugin is also affected.

This problem seems to be related to lxml.etree.XMLParser which appears to have a limit on the size of text nodes it can handle.

I've quickly looked at XMLParser options and there is an option huge_tree which could hopefully help:

huge_tree - disable security restrictions and support very deep trees and very long text content (only affects libxml2 2.7+)

I've tried to reproduce the problem locally (for now without tmt) and the huge_tree option is effective:

Create a large text node

#!/usr/bin/env python3

import xml.etree.ElementTree as ET

large_text = 'a' * (10 * 1024 * 1024 + 1)  # 10MB + 1 byte
root = ET.Element('root')
root.text = large_text
tree = ET.ElementTree(root)
tree.write('large_xml_file.xml')

Try to parse the file with huge_tree=False:

$ cat parse-xml.py
#!/usr/bin/env python3

import lxml.etree as ET

parser = ET.XMLParser(huge_tree=False)
try:
    tree = ET.parse('large_xml_file.xml', parser)
except ET.ParserError as e:
    print(e)

$ ./parse-xml.py
Traceback (most recent call last):
  File "/home/user/Repos/tmt/./parse-xml.py", line 9, in <module>
    tree = ET.parse('large_xml_file.xml', parser)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/lxml/etree.pyx", line 3541, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1879, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1905, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1808, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1180, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 618, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 728, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 657, in lxml.etree._raiseParseError
  File "large_xml_file.xml", line 1
lxml.etree.XMLSyntaxError: xmlSAX2Characters: huge text node, line 1, column 10004001

By setting the huge_tree option to True, the parsing works without the huge text node exception.

teemtee / tmt

Polarion XML fails to generate due to xmlSAX2Characters: huge text node #3363