mbevilacqua / appcompatprocessor

"Evolving AppCompat/AmCache data analysis beyond grep"
Apache License 2.0
190 stars 26 forks source link

CSV Parsing Failing #10

Closed stuartbird closed 6 years ago

stuartbird commented 6 years ago

Hi, I'm trying out AppCompatParser for the first time today but hitting a slight bump when trying to load csv files. The error I am getting is below:

`# ./AppCompatProcessor.py /cases/XXX.db load /cases/ /home/stu/Downloads/appcompatprocessor-master/appSearch.py:454: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(False, "We're in trouble") 2017-12-17 12:14:51,272 INFO -------------------------------Log started------------------------------- 2017-12-17 12:14:51,274 INFO Starting to process request... 2017-12-17 12:14:51,275 INFO Initializing /cases/XXX.db sqlite DB 2017-12-17 12:14:51,358 INFO Loading / adding records to database... 2017-12-17 12:14:51,361 WARNING Skiping file, no ingest plugin found to process: XXX.db 2017-12-17 12:14:51,362 INFO Total files in /cases/XXX.zip: 6 2017-12-17 12:14:51,362 INFO Hold on while we check the zipped files... 2017-12-17 12:14:51,362 ERROR No valid files found! 2017-12-17 12:14:51,366 WARNING No ingest plugin could process: hostname1.csv (skipping file) [size: 22951] 2017-12-17 12:14:51,369 WARNING No ingest plugin could process: hostname2.csv (skipping file) [size: 20015] 2017-12-17 12:14:51,369 WARNING No ingest plugin could process: hostname3.csv (skipping file) [size: 39600] 2017-12-17 12:14:51,375 INFO Calculate ID's for new hosts/instances: [########-----------------] 33.33% 2017-12-17 12:14:51,376 INFO Calculate ID's for new hosts/instances: [#################--------] 66.67% 2017-12-17 12:14:51,378 INFO
2017-12-17 12:14:51,378 INFO Found 3 new instances 2017-12-17 12:14:51,381 INFO Deleting indexes 2017-12-17 12:14:52,386 INFO Parsing files: [-------------------------] 0.0% 2017-12-17 12:14:53,389 INFO Parsing files: [#################--------] 66.67% 2017-12-17 12:14:54,390 INFO
2017-12-17 12:14:55,392 INFO Load speed: 0:00:01.344579 seconds / file 2017-12-17 12:14:55,392 INFO Load time: 0:00:04 2017-12-17 12:14:55,393 INFO Indexing sqlite DB /cases/XXX.db

2017-12-17 12:14:55,427 INFO Loading done. 2017-12-17 12:14:55,427 INFO Total hosts: 3 2017-12-17 12:14:55,427 INFO Total instances: 3 2017-12-17 12:14:55,427 INFO Total entries: 4636 2017-12-17 12:14:55,427 INFO Done ` As you can see it is parsing the raw AmCache.hve files without a problem, but is not parsing the csv files. The csv files were created with ShimCacheParser.py included in SIFT Workstation (saltstack install on Ubuntu 16.04 LTS) in a mounted E01 image with the command run from "/mnt/windows_mount/Windows/System32/config/" as follows:

#ShimCacheParser.py -ti --bom -o /cases/hostname.csv

The tool is working great with the *.hve data. Could someone point me in the right direction for a solution for the shimcache data please?

Thanks SB

randomstash commented 6 years ago

Hi, could you please try this out with the /dev branch and let me know? That one is working fine on my tests but not yet ready to move over to /master.

stuartbird commented 6 years ago

Hi, Apologies it took me a while to get back to this. I have run this again from the /dev branch, using the same source data as before but still see an error relating to the CSV files:

` root@siftworkstation -> /h/s/D/appcompatprocessor-develop ./AppCompatProcessor.py /cases/xxx.db load /cases/xxx-data/ 2018-01-09 09:34:56,802 INFO -------------------------------Log started------------------------------- 2018-01-09 09:34:56,803 INFO Starting to process request... 2018-01-09 09:34:56,804 INFO Loading / adding records to database... 2018-01-09 09:34:56,807 INFO Calculating ID for: /cases/xxx-data/hostname.hve 2018-01-09 09:34:56,813 INFO Calculating ID for: /cases/xxx-data/hostname.csv 2018-01-09 09:34:56,816 WARNING No ingest plugin could process: hostname.csv (skipping file) [size: 22951] 2018-01-09 09:34:56,816 INFO Calculating ID for: /cases/xxx-data/hostname.csv 2018-01-09 09:34:56,817 WARNING No ingest plugin could process: hostname.csv (skipping file) [size: 20015] 2018-01-09 09:34:56,817 INFO Calculating ID for: /cases/xxx-data/hostname.csv 2018-01-09 09:34:56,817 WARNING No ingest plugin could process: hostname.csv (skipping file) [size: 39600] 2018-01-09 09:34:56,817 INFO Calculating ID for: /cases/xx-data/hostname.hve 2018-01-09 09:34:56,818 INFO Calculating ID for: /cases/xxx-data/hostname.hve 2018-01-09 09:34:56,820 INFO Calculate ID's for new hosts/instances: [########-----------------] 33.33% 2018-01-09 09:34:56,821 INFO Calculate ID's for new hosts/instances: [#################--------] 66.67% 2018-01-09 09:34:56,823 INFO
2018-01-09 09:34:56,823 INFO Found 0 new instances 2018-01-09 09:34:56,825 INFO Load speed: 0:00:00.021148 seconds / file 2018-01-09 09:34:56,825 INFO Load time: 0:00:00

2018-01-09 09:34:56,826 INFO Loading done. 2018-01-09 09:34:56,826 INFO Total hosts: 3 2018-01-09 09:34:56,826 INFO Total instances: 3 2018-01-09 09:34:56,826 INFO Total entries: 4636 2018-01-09 09:34:56,826 INFO Done `

Thanks

Stuart

mbevilacqua commented 6 years ago

Hi Stuart, can't reproduce this here I'm afraid. Could you run /dev with only one of those csv's against a new database with the verbose flag -v and see if you get any more detail on the .log file created for the run? Also if you could paste here the header and sanitised 1st line of that csv maybe we can spot deviations from the expected format.

stuartbird commented 6 years ago

Hi Mattias, Thank you, see the log output below:

2018-01-09 10:48:45,948 root INFO MainProcess -------------------------------Log started------------------------------- 2018-01-09 10:48:45,949 root DEBUG MainProcess Python version: 2.7.12 (default, Nov 20 2017, 18:23:56) [GCC 5.4.0 20160609] 2018-01-09 10:48:45,950 root DEBUG MainProcess Physical mem used: 8% 2018-01-09 10:48:45,950 __main__ INFO MainProcess Starting to process request... 2018-01-09 10:48:45,950 __main__ DEBUG MainProcess Arguments [4]: ['-v', '/cases/csv-test.db', 'load', '/cases/csv/'] 2018-01-09 10:48:45,950 __main__ DEBUG MainProcess Options: Namespace(database_file='/cases/csv-test.db', governorOffFlag=False, maxCores=8, module_name='load', outputFile='Output.txt', pathtoload='/cases/csv/', rawoutput=False, verbose=1) 2018-01-09 10:48:45,950 appDB INFO MainProcess Initializing /cases/csv-test.db sqlite DB 2018-01-09 10:48:46,061 __main__ DEBUG MainProcess Database version: 0.8.0 2018-01-09 10:48:46,061 appDB DEBUG MainProcess Sqlite database adapter version: 3.11.0 2018-01-09 10:48:46,061 __main__ INFO MainProcess Loading / adding records to database... 2018-01-09 10:48:46,062 appLoad DEBUG MainProcess Starting appLoadMP 2018-01-09 10:48:46,065 appLoad DEBUG MainProcess Adding file to process: /cases/csv/hostname.csv 2018-01-09 10:48:46,065 appLoad INFO MainProcess Calculating ID for: /cases/csv/hostname.csv 2018-01-09 10:48:46,068 appAux DEBUG MainProcess Loading file /cases/csv/hostname.csv 2018-01-09 10:48:46,068 appAux DEBUG MainProcess Read 1000 bytes [ef:bb:bf:4c:61:73:74:20:4d:6f:64:69:66:69:65:64:2c:4c:61:73] 2018-01-09 10:48:46,069 appAux DEBUG MainProcess Loading file /cases/csv/hostname.csv 2018-01-09 10:48:46,069 appAux DEBUG MainProcess Read 39600 bytes [ef:bb:bf:4c:61:73:74:20:4d:6f:64:69:66:69:65:64:2c:4c:61:73] 2018-01-09 10:48:46,070 appLoad WARNING MainProcess No ingest plugin could process: hostname.csv (skipping file) [size: 39600] 2018-01-09 10:48:46,070 appLoad INFO MainProcess Found 0 new instances 2018-01-09 10:48:46,070 mpEngineProdCons DEBUG MainProcess mpEngine initializing 2018-01-09 10:48:46,072 appLoad INFO MainProcess Load speed: 0:00:00.010447 seconds / file 2018-01-09 10:48:46,072 appLoad INFO MainProcess Load time: 0:00:00 2018-01-09 10:48:46,072 mpEngineProdCons DEBUG MainProcess Bringing down mpEngine 2018-01-09 10:48:46,072 mpEngineProdCons DEBUG MainProcess mpEngine down 2018-01-09 10:48:46,073 __main__ INFO MainProcess Loading done. 2018-01-09 10:48:46,073 __main__ INFO MainProcess Total hosts: 0 2018-01-09 10:48:46,073 __main__ INFO MainProcess Total instances: 0 2018-01-09 10:48:46,073 __main__ INFO MainProcess Total entries: 0 2018-01-09 10:48:46,073 __main__ INFO MainProcess Done 2018-01-09 10:48:46,073 root DEBUG MainProcess Shuting down logger listener. Output.txt.log (END)

And the CSV header and first line: Last Modified Last Update Path File Size Exec Flag 2013-08-22 12:45:17 N/A SYSVOL\Windows\System32\svchost.exe N/A True

Stuart ``

mbevilacqua commented 6 years ago

Hi StuartBird, I see from the logs your csv files have a UTF-8 BOM in them (ef:bb:bf). I believe that is likely the culprit here as a quick test with a known good csv test case after adding BOM did fail like your error log does. I'll be seeing about fixing that but don't have time right now. In the meantime you may be able to work around this by saving the file as UTF-8 with no BOM though.

Thanks for reporting this!

stuartbird commented 6 years ago

Hi Mattias, Apologies for the late reply and thank you for digging into this, I'm happy to report that all is now working correctly. As per your suggestion I processed the Shimcache's again without the "--bom" switch and the CSV's are now being parsed effectively. I've been seeing some great results using the tool since that time too :-)

Stuart

mbevilacqua commented 6 years ago

Hi Stuart, thanks for reaching back and good to hear it's working fine for you. I'll update the documentation in the meantime to highlight the BOM issue. Thanks!