mandiant / AuditParser

AuditParser
Apache License 2.0
58 stars 23 forks source link

Crash due to UTF-8 issues while processing Redline audit files generated on non-US Windows operating systems #1

Open saadkadhi opened 11 years ago

saadkadhi commented 11 years ago

Hi Ryan,

I've tried AuditParser against audit files generated by Redline 1.7 (comprehensive collector, default settings) generated on a non-US Windows operating system.

The program crashed with the following error while processing mir.w32apifiles:

Traceback (most recent call last): File "/tools/AuditParser/AuditParser.py", line 482, in main() File "/tools/AuditParser/AuditParser.py", line 472, in main else: parseXML(inFile,outFile) File "/tools/AuditParser/AuditParser.py", line 217, in parseXML writer.writerow(row) UnicodeEncodeError: 'ascii' codec can't encode characters in position 68-77: ordinal not in range(128)

There are a few row.append() instances that should use encode("utf-8"). Once fixed, the program runs smoothly until it hits mir.w32scripting-persistence. It dies with the following error:

Parsing input file: 20121109064423/mir.w32scripting-persistence.60254e2b.xml main() File "/tools/AuditParser/AuditParser.py.new", line 471, in main if (filename.find("persistence") > 0): parsePersistence(inFile, outFile) File "/tools/AuditParser/AuditParser.py.new", line 297, in parsePersistence row[i] = rowValue.encode("utf-8") UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 8: ordinal not in range(128)

This second issue is to due to an extraneous encode("utf-8") in line 250:

row.append(rowData.encode("utf-8"))

Once removed, AuditParser.py processes all files without a hiccup.

I've made a diff to fix the issues: http://pastebin.com/SmRKR6sR

Best Regards, Saad Kadhi (@_saadk)

ryankaz commented 11 years ago

Saad,

Thanks for catching these bugs! There’s definitely some ugly parsing code to handle all of the edge-case XML schema produced by some of the audits, and I really appreciate you taking the time to provide these fixes. I’ll roll your patches into an upcoming update – and I also have some longer-term plans to do more extensive refactoring. Hope you’ve found it useful now that it’s running without crashes!

-Ryan

From: Saad Kadhi [mailto:notifications@github.com] Sent: Wednesday, November 21, 2012 5:50 AM To: mandiant/AuditParser Subject: [AuditParser] Crash due to UTF-8 issues while processing Redline audit files generated on non-US Windows operating systems (#1)

Hi Ryan,

I've tried AuditParser against audit files generated by Redline 1.7 (comprehensive collector, default settings) generated on a non-US Windows operating system.

The program crashed with the following error while processing mir.w32apifiles:

Traceback (most recent call last): File "/tools/AuditParser/AuditParser.py", line 482, in main() File "/tools/AuditParser/AuditParser.py", line 472, in main else: parseXML(inFile,outFile) File "/tools/AuditParser/AuditParser.py", line 217, in parseXML writer.writerow(row) UnicodeEncodeError: 'ascii' codec can't encode characters in position 68-77: ordinal not in range(128)

There are a few row.append() instances that should use encode("utf-8"). Once fixed, the program runs smoothly until it hits mir.w32scripting-persistence. It dies with the following error:

Parsing input file: 20121109064423/mir.w32scripting-persistence.60254e2b.xml main() File "/tools/AuditParser/AuditParser.py.new", line 471, in main if (filename.find("persistence") > 0): parsePersistence(inFile, outFile) File "/tools/AuditParser/AuditParser.py.new", line 297, in parsePersistence row[i] = rowValue.encode("utf-8") UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 8: ordinal not in range(128)

This second issue is to due to an extraneous encode("utf-8") in line 250:

row.append(rowData.encode("utf-8"))

Once removed, AuditParser.py processes all files without a hiccup.

I've made a diff to fix the issues: http://pastebin.com/SmRKR6sR

Best Regards, Saad Kadhi (@_saadk)

— Reply to this email directly or view it on GitHub https://github.com/mandiant/AuditParser/issues/1 .

https://github.com/notifications/beacon/J6T91GIPIyhU-8ti4GCGP2LO_5GlxG1sGFjyssOUJ4jJrXJmLOzFtEGJCcRHWqYN.gif

saadkadhi commented 11 years ago

You are very welcome. And thanks to you for putting time and energy into releasing AuditParser. I am really glad to hear that you are going to continue improving & maintaining the code.

I am still encountering crashes but these are due to lxml (char out of range types of error generated when encountering non-printable chars). If you have already been bitten by this kind of edge cases, I'd be glad to hear how you solved them.

Cheers, Saad Kadhi (@_saadk)