orlikoski / CDQR

The Cold Disk Quick Response (CDQR) tool is a fast and easy to use forensic artifact parsing tool that works on disk images, mounted drives and extracted artifacts from Windows, Linux, MacOS, and Android devices
GNU General Public License v3.0
332 stars 51 forks source link

Can't parse zip if hostname contains '-' #52

Open armstrongcyber opened 5 years ago

armstrongcyber commented 5 years ago

First off - cool tool 👍

My hostname has two '-' in it and this causes the cdqr to fail at position 113.

skadi@skadi:~$ cdqr in:NOT-MY-HOSTNAME.zip out:Results -p win --max_cpu -z
Assigning CDQR to the host network
The Docker network can be changed by modifying the "DOCKER_NETWORK" environment variable
Example (default Skadi mode): export DOCKER_NETWORK=host
Example (use other Docker network): export DOCKER_NETWORK=skadi-backend
docker run  --network host  -v /home/skadi/NOT-MY-HOSTNAME.zip:/home/skadi/NOT-MY-HOSTNAME.zip -v /home/skadi/Results:/home/skadi/Results aorlikoski/cdqr:5.1.0 -y /home/skadi/NOT-MY-HOSTNAME.zip /home/skadi/Results -p win --max_cpu -z
CDQR Version: 5.0
Plaso Version: 20190331
Using parser: win
Number of cpu cores to use: 4
Destination Folder: /home/skadi/Results
Attempting to extract source file: /home/skadi/NOT-MY-HOSTNAME.zip
Unable to extract file: /home/skadi/NOT-MY-HOSTNAME.zip
'ascii' codec can't encode character '\u2013' in position 113: ordinal not in range(128)

The u2013 char is the - https://www.fileformat.info/info/unicode/char/2013/index.htm

orlikoski commented 5 years ago

That's definitely a bug and thank you for reporting it. The zip file handling section looks like it would need to be updated to handle that correctly probably here. https://github.com/orlikoski/CDQR/blob/master/src/cdqr.py#L1753-L1765

orlikoski commented 5 years ago

Does changing the name of the zip file get you past the problem for now?

armstrongcyber commented 5 years ago

No, and I even changed the hostname (to a single word no spaces or special chars) and re-ran the CyLR tool but it still stopped on the same error on both the 5.1.0 and 5.0.0 versions. However, I think the old hostname will be embedded in some of the files but if the error is in the unzipping part then it should not affect it.

On that thought, I also noticed when I did an general unzip $ unzip FILE.zip that I was asked about clobbering already unzipped files, unless that is contributing to the problem /0\

orlikoski commented 5 years ago

I just did a test on a zip file that has -- in the hostname and an empty text file inside with -- in the file name. It completed with no errors so I think there is something else going on. Very strange as it's definitely a problem with the unzip of the file.

Can you try making a copy of the zip file with no name conflicts when it unzips manually and then try CDQR on that new zip file? That will determine if it's related or a red herring.

Here are the logs of that run.

skadi@skadi_prime:~/test$ unzip -l hostname--is--fine.zip
Archive:  hostname--is--fine.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2019-08-13 10:20   hostname--is--fine.txt
---------                     -------
        0                     1 file
skadi@skadi_prime:~/test$ cdqr in:hostname--is--fine.zip out:Results
Assigning CDQR to the host network
The Docker network can be changed by modifying the "DOCKER_NETWORK" environment variable
Example (default Skadi mode): export DOCKER_NETWORK=host
Example (use other Docker network): export DOCKER_NETWORK=skadi-backend
docker run  --network host  -v /home/skadi/test/hostname--is--fine.zip:/home/skadi/test/hostname--is--fine.zip -v /home/skadi/test/Results:/home/skadi/test/Results aorlikoski/cdqr:5.1.0 -y /home/skadi/test/hostname--is--fine.zip /home/skadi/test/Results
CDQR Version: 5.1.0
Plaso Version: 20190708
Using parser: win
Number of cpu cores to use: 9
Destination Folder: /home/skadi/test/Results
Attempting to extract source file: /home/skadi/test/hostname--is--fine.zip
All files extracted to folder: /home/skadi/test/Results/artifacts/hostname--is--fine
Source data: /home/skadi/test/Results/artifacts/hostname--is--fine
Log File: /home/skadi/test/Results/hostname--is--fine.log
Database File: /home/skadi/test/Results/hostname--is--fine.plaso
SuperTimeline CSV File: /home/skadi/test/Results/hostname--is--fine.SuperTimeline.csv

Start time was: 2019-08-13 15:21:18.661967
Processing started at: 2019-08-13 15:21:18.662021
Parsing image
"log2timeline.py" "--partition" "all" "--vss_stores" "all" "--status_view" "linear" "--parsers" "bash,bencode,czip,esedb,filestat,lnk,mcafee_protection,olecf,pe,prefetch,recycle_bin,recycle_bin_info2,sccm,sophos_av,sqlite,symantec_scanlog,winevt,winevtx,webhist,winfirewall,winjob,winreg,zsh_extended_history" "--hashers" "md5" "--workers" "9" "--logfile" "/home/skadi/test/Results/hostname--is--fine_log2timeline.gz" "/home/skadi/test/Results/hostname--is--fine.plaso" "/home/skadi/test/Results/artifacts/hostname--is--fine" "--no_dependencies_check"
Parsing ended at: 2019-08-13 15:21:33.682040
Parsing duration was: 0:00:15.020019

Removing uncompressed files in directory: /home/skadi/test/Results/artifacts/

Creating the SuperTimeline CSV file
"psort.py" "-o" "l2tcsv" "--status_view" "none" "/home/skadi/test/Results/hostname--is--fine.plaso" "--logfile" "/home/skadi/test/Results/hostname--is--fine_psort.gz" "-w" "/home/skadi/test/Results/hostname--is--fine.SuperTimeline.csv"
SuperTimeline CSV file is created
Reporting started at: 2019-08-13 15:21:37.706923

Creating the individual reports (This will take a long time for large files)
Report Created: /home/skadi/test/Results/Reports/Appcompat Report.csv
Report Created: /home/skadi/test/Results/Reports/Event Log Report.csv
Report Created: /home/skadi/test/Results/Reports/File System Report.csv
Report Created: /home/skadi/test/Results/Reports/MFT Report.csv
Report Created: /home/skadi/test/Results/Reports/UsnJrnl Report.csv
Report Created: /home/skadi/test/Results/Reports/Internet History Report.csv
Report Created: /home/skadi/test/Results/Reports/Prefetch Report.csv
Report Created: /home/skadi/test/Results/Reports/Registry Report.csv
Report Created: /home/skadi/test/Results/Reports/Scheduled Tasks Report.csv
Report Created: /home/skadi/test/Results/Reports/Persistence Report.csv
Report Created: /home/skadi/test/Results/Reports/System Information Report.csv
Report Created: /home/skadi/test/Results/Reports/AntiVirus Report.csv
Report Created: /home/skadi/test/Results/Reports/Firewall Report.csv
Report Created: /home/skadi/test/Results/Reports/Amcache Report.csv
Report Created: /home/skadi/test/Results/Reports/Bash Report.csv

Did not keep 0 Reports due to no matching data from SuperTimeline

Created 15 Reports.  Now improving them
Improving Reports if possible (This will take a long time for large files)
File System Report.csv:    Complete
Appcompat Report.csv:    Complete
Event Log Report.csv:    Complete
Scheduled Tasks Report.csv:    Complete
MFT Report.csv:    Complete
Prefetch Report.csv:    Complete

All reporting complete
Reporting ended at: 2019-08-13 15:21:37.757599
Reporting duration was: 0:00:00.050676

Total duration was: 0:00:19.095749
armstrongcyber commented 5 years ago

Yes unzipping and accepting the clobbering of the dupes and then re-zipping it worked.

I'll have a look tomorrow at what files were duplicated to see if there is a source listed twice in CyLR.

Thx

orlikoski commented 5 years ago

That's great to know what it is and thank you for helping troubleshoot. What version of CyLR were you using and on what OS?

armstrongcyber commented 5 years ago

Versions: CyLR Version 2.1.0.0 on Windows 10 Pro 1903 (latest).

The files the unzip process wanted to add dupes of were all of the following type: (note I have changed the in path username to ):


C/Users/<username>/AppData/Roaming/Microsoft/Windows/Recent/AutomaticDestinations/3353b940c074fd0c.automaticDestinations-ms
C/Users/<username>/AppData/Roaming/Microsoft/Windows/Recent/AutomaticDestinations/443fe08d447d028f.automaticDestinations-ms
C/Users/<username>/AppData/Roaming/Microsoft/Windows/Recent/AutomaticDestinations/50620fe75ee0093.automaticDestinations-ms
C/Users/<username>/AppData/Roaming/Microsoft/Windows/Recent/AutomaticDestinations/59fe1486d27aa9d0.automaticDestinations-ms

There was a set of about 20 of these for each user account on the system (I'd added just 4). I wonder if the issue is that the CyLR is accidentally adding the dir twice.

orlikoski commented 5 years ago

none of those are duplicates as they each have a unique name in the example provided and the username should be unique. It pulls the usernames from the registry in order to lookup the pathing for those. I wonder if there are multiple registry entries for the same username being used multiple times (like added, deleted, added again).

orlikoski commented 5 years ago

This is so weird. I verified in the CyLR code that ....AppData/Roaming/Microsoft/Windows/Recent/AutomaticDestinations is only referenced once: https://github.com/orlikoski/CyLR/blob/master/CyLR/src/CollectionPaths.cs#L91

I also confirmed that everything in that folder is being stored twice (but only for that one path as all the others have no duplicates) so that is an issue. I don't know why it didn't show up before. Nor do I know why CDQR is spitting out errors on extracting it now.

I also don't know why the unzip if failing as it uses ZipFile https://docs.python.org/3/library/zipfile.html . I then did a test at the command line to extract a CyLR 2.1.0 collection from Windows 10 ProWITH DUPLICATES using python3 -m zipfile -e DESKTOP-ASDFSDF.zip temp and python -m zipfile -e DESKTOP-ASDFSDF.zip temp that both finished with no errors.

Then I tried using the CDQR 5.1.0 docker to process the zip file and it worked with no errors. I don't know what's going on to cause it to work for me and not for others.

skadi@skadi_prime:~$ cdqr in:DESKTOP-ASDFSDF.zip out:Results2
Assigning CDQR to the host network
The Docker network can be changed by modifying the "DOCKER_NETWORK" environment variable
Example (default Skadi mode): export DOCKER_NETWORK=host
Example (use other Docker network): export DOCKER_NETWORK=skadi-backend
docker run  --network host  -v /home/skadi/DESKTOP-ASDFSDF.zip:/home/skadi/DESKTOP-ASDFSDF.zip -v /home/skadi/Results2:/home/skadi/Results2 aorlikoski/cdqr:5.1.0 -y /home/skadi/DESKTOP-ASDFSDF.zip /home/skadi/Results2
CDQR Version: 5.1.0
Plaso Version: 20190708
Using parser: win
Number of cpu cores to use: 9
Destination Folder: /home/skadi/Results2
Attempting to extract source file: /home/skadi/DESKTOP-ASDFSDF.zip
All files extracted to folder: /home/skadi/Results2/artifacts/DESKTOP-ASDFSDF
Source data: /home/skadi/Results2/artifacts/DESKTOP-ASDFSDF
Log File: /home/skadi/Results2/DESKTOP-ASDFSDF.log
Database File: /home/skadi/Results2/DESKTOP-ASDFSDF.plaso
SuperTimeline CSV File: /home/skadi/Results2/DESKTOP-ASDFSDF.SuperTimeline.csv
jofarinha commented 4 years ago

Hi! Any progress on this issue? I'm facing the same error (although on a different line):

skadi@skadi:~$ cdqr in:DESKTOPXXXXXX.zip --es_ts case1 --max_cpu
Assigning CDQR to the host network
The Docker network can be changed by modifying the "DOCKER_NETWORK" environment variable
Example (default Skadi mode): export DOCKER_NETWORK=host
Example (use other Docker network): export DOCKER_NETWORK=skadi-backend
docker run  --network host  -v /home/skadi/DESKTOPXXXXXX.zip:/home/skadi/DESKTOPXXXXXX.zip --add-host=elasticsearch:127.0.0.1 --add-host=postgres:127.0.0.1 -v /opt/Skadi/Docker/timesketch/timesketch_default.conf:/etc/timesketch.conf aorlikoski/cdqr:5.0.0 -y /home/skadi/DESKTOPXXXXXX.zip --es_ts case1 --max_cpu
CDQR Version: 5.0
Plaso Version: 20190331
Using parser: win
Number of cpu cores to use: 4
Destination Folder: Results
Attempting to extract source file: /home/skadi/DESKTOPXXXXXX.zip
Unable to extract file: /home/skadi/DESKTOPXXXXXX.zip
'ascii' codec can't encode character '\u2013' in position 95: ordinal not in range(128)

Note that originally the hostname was DESKTOP-XXXXXX, and I removed the dash in the filename (with the original filename, the same error happened in position 96).

I'm running CyLR 2.1.0 on WIN10 PRO 1903, and CDQR on the Skadi OVA (default, just downloaded today and ran in VMWare Workstation).

Edit: tried the unzip (same problem with overwriting files in the same folder as the above post) and zipping the result, and now fails with another character: 'ascii' codec can't encode character '\u03c0' in position 108: ordinal not in range(128) Can it have something to do with regional settings? My source Windows machine is in English, but with regional settings set to Portuguese...

epicsilence99 commented 4 years ago

hi @jofarinha I will try take a look at this sometime week and test myself to see if I experience the same issue

bestiax commented 4 years ago

Any news for this? Im facing a similar problem 'ascii' codec can't encode character '\xe4' in position 103: ordinal not in range(128) grafik https://www.fileformat.info/info/unicode/char/00e4/index.htm

Results folder contains 16GB of data after this, means it was extracted successfully but Kibana isnt showing any case data.

The letter 'ä' is neither part of the zip folder name nor the host name. The ZIP file is around 360MB and just cancels after some time. Unzipping it leads to 16G of data and a instant successfull finish of cdqr when using the folder instead of the zip file.

orlikoski commented 4 years ago

If I remember correctly, this is an issue with the library used by python for archives. Changing that out is possible but will require reworking that function to ensure it works correctly in Window, Linux, and MacOS environments.

The easiest and fastest way forward is to extract the any archives that have this issue manually into a temp folder, such as /tmp/<custom_name>, and then use CDQR to process that cdqr in:/tmp/<custom_name> ........