Bug in the pe parser (Unicode issue)

kiddinn commented 9 years ago

Parsing the nfury SANS disk image:

$ log2timeline.py --status_view window --logfile /tmp/fury.log /tmp/fury.plaso win7-64-nfury-c-drive.E01

Seeing tracebacks like this:

[pe] Unable to process file: type: OS, location:<PATH>/test_images/win7-64-nfury-10.3.58.6/win7-64-nfury-c-drive/win7-64-nfury-c-drive.E01
type: EWF
type: TSK, inode: 75147, location: /Windows/assembly/NativeImages_v2.0.50727_32/UIAutomationClient/d0972fea9e965a565c3cff76982709db/UIAutomationClient.ni.dll
 with error: 'ascii' codec can't decode byte 0x90 in position 7: ordinal not in range(128).

2015-06-02 10:40:54,309 [ERROR] (Worker_08 ) PID:16459 <worker> 'ascii' codec can't decode byte 0x90 in position 7: ordinal not in range(128)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/plaso-1.2.1_20150602-py2.7.egg/plaso/engine/worker.py", line 169, in _ParseFileEntryWithParser
  File "/usr/local/lib/python2.7/dist-packages/plaso-1.2.1_20150602-py2.7.egg/plaso/parsers/interface.py", line 72, in UpdateChainAndParse
  File "/usr/local/lib/python2.7/dist-packages/plaso-1.2.1_20150602-py2.7.egg/plaso/parsers/interface.py", line 256, in Parse
  File "/usr/local/lib/python2.7/dist-packages/plaso-1.2.1_20150602-py2.7.egg/plaso/parsers/pe.py", line 259, in ParseFileObject
  File "/usr/local/lib/python2.7/dist-packages/plaso-1.2.1_20150602-py2.7.egg/plaso/parsers/pe.py", line 132, in _GetSectionNames
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 7: ordinal not in range(128)

kiddinn commented 9 years ago

Error lies here:

section_name = u'{0:s}'.format(section_name.encode(u'unicode_escape'))

In one of the examples the culprit is:

(Pdb) section_name
'.xdata\x00\x90'

joachimmetz commented 9 years ago

FYI using string_escape is not going to fly as a long term fix, since it is not supported by Python 3. Maybe:

codecs.getdecoder('unicode_escape')(b'as\x00\x90')[0]

kiddinn commented 9 years ago

(Pdb) !section_name.encode(u'unicode_escape') *\ UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 7: ordinal not in range(128)

another "solution" is just to catch the unicode decode error and use repr() on the string instead. That will get the text in full, escaped... like:

(Pdb) !repr(section_name)
"'.xdata\\x00\\x90'"

kiddinn commented 9 years ago

and the unicode_escape is not a fix, this is the part of the code that causes the error to be raised.

kiddinn commented 9 years ago

The bug is most likely inside the pe library itself:

In [1]: import pefile
In [2]: fh = open('/tmp/pefile', 'rb')
In [3]: data = fh.read()
In [4]: p = pefile.PE(data=data, fast_load=True)
In [5]: p.is_dll()
Out[5]: True
In [6]: sections = p.sections
In [7]: [getattr(s, u'Name', b'') for s in sections]
Out[7]: 
['.data\x00j\x08',
 '.xdata\x00\x90',
 '.text\x00\x8b\x06',
 '.extrel\x00',
 '.reloc\x00\x90']

All the section names have data after it... will split the string on \x00

kiddinn commented 9 years ago

http://codereview.appspot.com/245760043

joachimmetz commented 9 years ago

Know that section names can be fully binary data as well. They don't need to be ASCII strings.

kiddinn commented 9 years ago

CL is in, and tested again, bug is fixed.

log2timeline / plaso

Bug in the pe parser (Unicode issue) #220