"xmlSAX2Characters: huge text node" when getting big results with gmp.get_reports.

falkowich commented 6 years ago

When getting big results with gmp.get_reports and shell_mode=True we get a "huge text node" error.

For example this error.

Error: xmlSAX2Characters: huge text node, line 12679, column 561 (, line 12679)

It seems like lxml don't like big files without a Parser option. _hugetree - disable security restrictions and support very deep trees and very long text content (only affects libxml2 2.7+) >> lxml.de/parsing.html#parsers

This is the diff for my ugly hack that handles "huge text nodes":

(ovas-mgr) falk@broekn ~/_tmp » diff gvm_connection.py-orig gvm_connection.py
39a40,42
> parser = etree.XMLParser(encoding='utf-8', recover=True, huge_tree=False)
> huge_parser = etree.XMLParser(encoding='utf-8', recover=True, huge_tree=True)
> 
108c111,117
<             tree = etree.parse(f)
---
>             try:
>                 tree = etree.parse(f, parser)
>             except Exception as err:
>                 if 'huge text node' in err.msg:
>                     tree = etree.parse(f, huge_parser)
>                 else:
>                     raise err
135c144
<             parser = etree.XMLParser(encoding='utf-8', recover=True)
---
>

Code to reproduce:

#!/usr/bin/python3

from gmp.gvm_connection import TLSConnection
from config import GVM_HOSTNAME, GVM_PORT, GVM_TIMEOUT, GVM_USER, GVM_PASSWD

# gmp has to be global, so the load-function has the correct namespace
gmp = None

# Huge report in openvas (~14M)
rid ='5b5a5053-da06-4ce7-a2e2-39150f16eb53'

def connect(rid):
    global gmp
    gmp = TLSConnection(hostname=GVM_HOSTNAME, port=GVM_PORT,
                        timeout=GVM_TIMEOUT, shell_mode=True)
    gmp.authenticate(GVM_USER, GVM_PASSWD)

def get_report(rid):
    try:
        report = gmp.get_reports(report_id=rid)
        r = report
    except Exception as e:
        print('Error: ' + str(e))
        r = None

    return r

if __name__ == '__main__':
    connect(rid)
    r = get_report(rid)
    print(r)

As you can see from my code, I'm no python coder and I don't know if this is of any interest, but perhaps it can help somewhat or someone :)

-- Regards Falk

bjoernricks commented 6 years ago

Could be related to https://github.com/greenbone/gvm/pull/103

falkowich commented 6 years ago

Hi,

It works if you add huge_tree=True to the parser object.

-- Regards Falk

greenbone / gvm-tools

"xmlSAX2Characters: huge text node" when getting big results with gmp.get_reports. #24