williballenthin / python-evtx

Pure Python parser for Windows Event Log files (.evtx)
Apache License 2.0
719 stars 165 forks source link

UnicodeDecodeError on BinaryParser #68

Open makitos666 opened 4 years ago

makitos666 commented 4 years ago

Sometimes, when parsing huge EVTX I get this error

in xml_records for xml, record in evtx_file_xml_view(evtx.get_file_header()): File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 240, in evtx_file_xml_view record_str = evtx_record_xml_view(record) File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 204, in evtx_record_xml_view return render_root_node(record.root()) File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 191, in render_root_node return render_root_node_with_subs(root_node, subs) File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 176, in render_root_node_with_subs rec(c, acc) File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 126, in rec rec(child, acc) File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 166, in rec sub = render_root_node(sub.root()) File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 191, in render_root_node return render_root_node_with_subs(root_node, subs) File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 176, in render_root_node_with_subs rec(c, acc) File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 126, in rec rec(child, acc) File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 126, in rec rec(child, acc) File "/usr/local/lib/python3.7/site-packages/Evtx/Views.py", line 159, in rec sub = escape_value(sub.string()) File "/usr/local/lib/python3.7/site-packages/Evtx/Nodes.py", line 1118, in string return self._string().rstrip("\x00") File "/usr/local/lib/python3.7/site-packages/Evtx/BinaryParser.py", line 211, in explicit_length_handler return f(offset, length) File "/usr/local/lib/python3.7/site-packages/Evtx/BinaryParser.py", line 490, in unpack_wstring return bytes(self._buf[start:end]).decode("utf16") UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 104-105: illegal UTF-16 surrogate

I think that is a good point to try/except decode issues, and return a NULL or default 2byte. If not all processing time is wasted without no results.