williballenthin / python-evtx

Pure Python parser for Windows Event Log files (.evtx)
Apache License 2.0
732 stars 166 forks source link

Evtx.BinaryParser.ParseException raised unexpectedly #45

Closed john-corcoran closed 6 years ago

john-corcoran commented 6 years ago

Hi,

I'm seeing exception Evtx.BinaryParser.ParseException raised unexpectedly when parsing event log "Microsoft-Windows-Windows Firewall With Advanced Security%4Firewall.evtx" (extracted from a Windows Server 2012 testing instance) using evtx_dump.py.

Same behaviour seen when parsing on Windows with Python 2.7, and Ubuntu with Python 2.7 / 3.5.

Traceback from Windows with Python 2.7 as follows:

Traceback (most recent call last):
  File "evtx_dump.py", line 42, in <module>
    main()
  File "evtx_dump.py", line 37, in main
    print(record.xml())
  File "C:\Python27\lib\site-packages\Evtx\Evtx.py", line 481, in xml
    return e_views.evtx_record_xml_view(self)
  File "C:\Python27\lib\site-packages\Evtx\Views.py", line 204, in evtx_record_xml_view
    return render_root_node(record.root())
  File "C:\Python27\lib\site-packages\Evtx\Views.py", line 182, in render_root_node
    for sub in root_node.substitutions():
  File "C:\Python27\lib\site-packages\Evtx\BinaryParser.py", line 64, in __call__
    cache[key] = self.func(*args, **kw)
  File "C:\Python27\lib\site-packages\Evtx\Nodes.py", line 1019, in substitutions
    raise ParseException("Invalid substitution value size")
Evtx.BinaryParser.ParseException: Parse Exception(Invalid substitution value size)

I can forward the EVTX file in question if that assists?

williballenthin commented 6 years ago

hey @john-corcoran

Thanks for raising this issue. The easiest way to triage it will be to share the .evtx file. I'd be happy to take a peek if you send to my email: willi.ballenthin@gmail.com

williballenthin commented 6 years ago

@john-corcoran just wanted to let you know i still have my eye on this issue. been doing some traveling so haven't had as much time as i'd have liked to triage the bug. the evtx files you provided should be enough for me to figure things out in the next few days.

williballenthin commented 6 years ago

I suspect this evtx file is corrupt. For example, when I simply print the record numbers, we get output like:

[...]
180
181
182
183
184
185
186
187
188
7999193089
[exception traceback]

clearly, 7999193089 is not a valid record number. I think expected behavior would be to identify this corruption earlier, and then notify the user that something is wrong (rather than trying to interpret invalid data).

williballenthin commented 6 years ago

evtx_info.py also fails:

λ C:\Users\user\env2\Scripts\python ..\scripts\evtx_info.py data\issue_45.evtx
Information from file header:
Format version  : 3.1
Flags           : 0x00000000
File is         : clean
Log is full     : no
Current chunk   : 2 of 3
Oldest chunk    : 1
Next record#    : 189
Check sum       : pass

Information from chunks:
  Chunk file (first/last)     log (first/last)      Header Data
- ----- --------------------- --------------------- ------ ------
>     1          1        66           1        66   pass   pass
      2         67       123          67       123   pass   pass
*     3        124       188         124       188   pass   pass
Traceback (most recent call last):
  File "..\scripts\evtx_info.py", line 109, in <module>
    main()
  File "..\scripts\evtx_info.py", line 82, in main
    if not chunk.check_magic():
  File "c:\users\user\documents\code\python-evtx\Evtx\Evtx.py", line 308, in check_magic
    return self.magic() == "ElfChnk\x00"
  File "c:\users\user\documents\code\python-evtx\Evtx\BinaryParser.py", line 211, in explicit_length_handler
    return f(offset, length)
  File "c:\users\user\documents\code\python-evtx\Evtx\BinaryParser.py", line 475, in unpack_string
    return self.unpack_binary(offset, length).decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa1 in position 2: ordinal not in range(128)
williballenthin commented 6 years ago

seems like this file contains junk data after 3 valid chunks:

λ C:\Users\user\env2\Scripts\python ..\scripts\evtx_info.py data\issue_45.evtx
Information from file header:
Format version  : 3.1
Flags           : 0x00000000
File is         : clean
Log is full     : no
Current chunk   : 2 of 3
Oldest chunk    : 1
Next record#    : 189
Check sum       : pass

Information from chunks:
  Chunk file (first/last)     log (first/last)      Header Data
- ----- --------------------- --------------------- ------ ------
>     1          1        66           1        66   pass   pass
      2         67       123          67       123   pass   pass
*     3        124       188         124       188   pass   pass
      4   [INVALID]
      5   [INVALID]
      6   [INVALID]
      7   [INVALID]
      8   [INVALID]
      9   [INVALID]
     10   [INVALID]
     11   [INVALID]
     12   [INVALID]
     13   [INVALID]
     14   [INVALID]
     15   [INVALID]
     16   [INVALID]

commits 5d2754f2375c8c2eadccba342d04b50432d35e30 and 1ed29cf1961c3927f5f2591785127b54f7a33da0 improve the robustness of the code that parses chunks.

williballenthin commented 6 years ago

i think the issue is that when enumerating chunks (and therefore records) via the Evtx log object, the log object doesn't take into account how many chunks are declared to be in the log file. Instead, it keeps reading until the end of the file. So, if there's junk data appended to the file, it attempts to parse this as valid data. See here: https://github.com/williballenthin/python-evtx/blob/5fb7662506d2c39aee5a9a143a9cc47d3bdc461d/Evtx/Evtx.py#L92

williballenthin commented 6 years ago

confirmed the theory above, and addressed the issue in df8edf41e40a999f764e6a83a5564643e9c25f68.

thanks to @john-corcoran for reporting this issue!