williballenthin / python-evtx

Pure Python parser for Windows Event Log files (.evtx)
Apache License 2.0
732 stars 166 forks source link

utf decode error in unpack_wstring #62

Open atcuno opened 5 years ago

atcuno commented 5 years ago
Traceback (most recent call last):
  File "/usr/local/bin/evtx_dump.py", line 4, in <module>
    __import__('pkg_resources').run_script('python-evtx==0.6.1', 'evtx_dump.py')
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 739, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1501, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/python_evtx-0.6.1-py2.7.egg/EGG-INFO/scripts/evtx_dump.py", line 42, in <module>

  File "/usr/local/lib/python2.7/dist-packages/python_evtx-0.6.1-py2.7.egg/EGG-INFO/scripts/evtx_dump.py", line 37, in main

  File "build/bdist.linux-x86_64/egg/Evtx/Evtx.py", line 498, in xml
  File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 204, in evtx_record_xml_view
  File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 191, in render_root_node
  File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 176, in render_root_node_with_subs
  File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 126, in rec
  File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 166, in rec
  File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 191, in render_root_node
  File "build/bdist.linux-x86_64/egg/Evtx/Views.py", line 175, in render_root_node_with_subs
  File "build/bdist.linux-x86_64/egg/Evtx/BinaryParser.py", line 64, in __call__
  File "build/bdist.linux-x86_64/egg/Evtx/Nodes.py", line 168, in children
  File "build/bdist.linux-x86_64/egg/Evtx/Nodes.py", line 153, in _children
  File "build/bdist.linux-x86_64/egg/Evtx/Nodes.py", line 733, in __init__
  File "build/bdist.linux-x86_64/egg/Evtx/BinaryParser.py", line 493, in unpack_wstring
  File "/usr/lib/python2.7/encodings/utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 900-901: illegal UTF-16 surrogate
jdeloshoyos commented 5 years ago

I'm using this script a lot, with great results, and have also encountered this problem when converting some logs that contain illegal characters in its data for whatever reason. Here's a quick and dirty fix that did the trick for me:

In Evtx/Nodes.py:

    def string(self):
        binary = self.binary()
        acc = []
        while len(binary) > 0:
            match = re.search(b"((?:[^\x00].)+)", binary)
            if match:
                frag = match.group()
                acc.append("<string>")
                # Begin change: add try/except block for handling illegal characters
                try:
                    acc.append(frag.decode("utf16"))
                except:
                    acc.append("[ILLEGAL CHARACTER]")
                # End change
                acc.append("</string>\n")
                binary = binary[len(frag) + 2:]
                if len(binary) == 0:
                    break
            frag = re.search(b"(\x00*)", binary).group()
            if len(frag) % 2 == 0:
                for _ in range(len(frag) // 2):
                    acc.append("<string></string>\n")
            else:
                raise ParseException("Error parsing uneven substring of NULLs")
            binary = binary[len(frag):]
        return "".join(acc)

Of course, the "[ILLEGAL CHARACTER]" string could be something shorter.