williballenthin / python-evtx

Pure Python parser for Windows Event Log files (.evtx)
Apache License 2.0
719 stars 165 forks source link

UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 0-1: illegal UTF-16 surrogate #43

Closed MNWPRO closed 6 years ago

MNWPRO commented 6 years ago

i don't know why,i need your help

williballenthin commented 6 years ago

Hi, @MNWPRO

In order to triage this issue, you'll need to provide more details about the data you are trying to parse, and the method with which you are parsing. Please share with me the script you're using to parse. If possible, please also share the source data, if its not sensitive.

MNWPRO commented 6 years ago

@williballenthin oh,i'm sorry, I used this script:Evtx_dump.py and The following 123.zip is the EVTX file, which is generated by sysmon.exe, sysmon.exe is a logging tool for Microsoft, and the following is a link to the tool: 123.zip

MNWPRO commented 6 years ago

@williballenthin
I am Chinese, my English is not good, please forgive me. I hope you can understand the above content

williballenthin commented 6 years ago

@MNWPRO thanks for the additional details. i've added a regression test so to this project so that its easy to reproduce. next, i'll try to figure out what the source of the bug is.

williballenthin commented 6 years ago

@MNWPRO can you use the windows event viewer to display event number 508 from the sysmon log? i can see that there is some encoded data, possibly in chinese, but i'm not sure what its supposed to be. if you can include a screenshot here that would be a big help.

MNWPRO commented 6 years ago

image In the picture, the Chinese is translated into English: this event is incorrect because the format of the base XML is incorrect. The following is the original text of the event. I'm sorry that I couldn't get back to you in time @williballenthin

MNWPRO commented 6 years ago

Will that be Sysmon's problem? If that's the case, it's Microsoft's own fault @williballenthin

MNWPRO commented 6 years ago

image This is a screenshot of the same event in other ID, and it does contain strange characters, which are meaningless, at least in my opinion @williballenthin

williballenthin commented 6 years ago

yes, this looks like its an issue with sysmon or Microsoft. seems like invalid data is provided to the event log, or it has become corrupt in some other way. unfortunately, I'm not sure that this python tool can do anything to fix it. i'd recommend registering an exception handler when processing the logs so that you can continue work even if you encounter corrupt entries.

williballenthin commented 6 years ago

please feel free to continue the discussion, but i'll close this issue as there's nothing to be done by this project.

YetteNiu commented 6 years ago

Hi all, I got a similar error with this. I used Anaconda- spyder to read some excel files with Chinese characters to a dataframe and got the following error: UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 6-7: unexpected end of data I was wondering did anyone of you two solved this issue and can do me a favor regarding this error? Thanks in advanced.

john-corcoran commented 6 years ago

Just chiming in that I've encountered the same issue. From checking output from Microsoft Log Parser, it looks like the events that cause the exception are legitimate but contain either corruption or just unexpected special characters.

Not sure if it's possible to show as much of the failing error as possible, and just replace any corrupted / special characters?

Stack traces are as follows:

Python 2.7 on Ubuntu 18.04:

Traceback (most recent call last):
  File "evtx_dump.py", line 42, in <module>
    main()
  File "evtx_dump.py", line 37, in main
    print(record.xml())
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Evtx.py", line 481, in xml
    return e_views.evtx_record_xml_view(self)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 204, in evtx_record_xml_view
    return render_root_node(record.root())
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 191, in render_root_node
    return render_root_node_with_subs(root_node, subs)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 176, in render_root_node_with_subs
    rec(c, acc)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 166, in rec
    sub = render_root_node(sub.root())
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 191, in render_root_node
    return render_root_node_with_subs(root_node, subs)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 176, in render_root_node_with_subs
    rec(c, acc)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 159, in rec
    sub = escape_value(sub.string())
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Nodes.py", line 1118, in string
    return self._string().rstrip("\x00")
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/BinaryParser.py", line 211, in explicit_length_handler
    return f(offset, length)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/BinaryParser.py", line 490, in unpack_wstring
    return bytes(self._buf[start:end]).decode("utf16")
  File "/usr/lib/python2.7/encodings/utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 0-1: illegal UTF-16 surrogate

Python 3.6 on Ubuntu 18.04:

Traceback (most recent call last):
  File "evtx_dump.py", line 42, in <module>
    main()
  File "evtx_dump.py", line 37, in main
    print(record.xml())
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Evtx.py", line 481, in xml
    return e_views.evtx_record_xml_view(self)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 204, in evtx_record_xml_view
    return render_root_node(record.root())
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 191, in render_root_node
    return render_root_node_with_subs(root_node, subs)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 176, in render_root_node_with_subs
    rec(c, acc)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 166, in rec
    sub = render_root_node(sub.root())
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 191, in render_root_node
    return render_root_node_with_subs(root_node, subs)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 176, in render_root_node_with_subs
    rec(c, acc)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 159, in rec
    sub = escape_value(sub.string())
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Nodes.py", line 1118, in string
    return self._string().rstrip("\x00")
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/BinaryParser.py", line 211, in explicit_length_handler
    return f(offset, length)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/BinaryParser.py", line 490, in unpack_wstring
    return bytes(self._buf[start:end]).decode("utf16")
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 0-1: illegal UTF-16 surrogate
RedCode-X commented 3 years ago

大家好,我也遇到了类似的错误。我使用Anacondaspyder将一些带有汉字的excel文件读取到数据帧中,并收到以下错误: UnicodeDecodeError:'utf-16-le'编解码器无法解码位置6-7的字节: 我想知道的数据意外结束你们两个人中的任何一个都解决了这个问题,可以帮我解决这个错误吗?提前致谢。

I have the same problem with excel . UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 166-167: unexpected end of data

nannapanenir commented 3 years ago

any solution for this