sleuthkit / autopsy

Autopsy® is a digital forensics platform and graphical interface to The Sleuth Kit® and other digital forensics tools. It can be used by law enforcement, military, and corporate examiners to investigate what happened on a computer. You can even use it to recover photos from your camera's memory card.
http://www.sleuthkit.org/autopsy/
2.41k stars 597 forks source link

[Feature request] Photorec module: add options to exclude some file types #5921

Closed ctrlaltca closed 3 years ago

ctrlaltca commented 4 years ago

Sometimes I need to analyze big disks containing an old file systems (eg. a ten-years old windows installation on a 500Gb hard disk image), carving them and extracting the result in terms of media files (images, videos, etc..). Both carving and extracitng the files takes a lot of time, and usually about 80% of the resulting files are useless to my analysis, most of them being portions of the file table (.fat, .mft) or other useless file types (eg. portions of websites like .jsp and .gz, binaries like .exe and .dll). Nowadays I just wait for the export to finish and then run a script that deleted all the files with the offending extensions. Photorec permits to limit the file types that are going to be carved using command line options, eg: fileopt,everything,disable,jpg,enable,gif,enable will scan only for jpg and gif files. It would be nice to expose these options in the photorec module settings page to limit carving to some file types only and speed it up.

ctrlaltca commented 4 years ago

In these days I had to work on such a case. Carving took about 8 days to complete and found 4.7 million files on an old hard disk. Trying to extract the carved files would probably take the same amount of time, but unfortunately failed after about one day because of a java out of memory error. I had to find a solution, so I manually modified the autopsy.db database to remove all the unneeded files. I used the following queries:

# get the data source obj_id
select obj_id, acquisition_details from data_source_info;

# check if the total number of carved files matches
select count(*) as num from tsk_files where data_source_obj_id=<obj_id> and parent_path = '/$CarvedFiles/';

# enumerate files by extension
select extension, count(*) as num from tsk_files where data_source_obj_id=<obj_id> and parent_path = '/$CarvedFiles/' group by extension order by num desc;

# delete mft files
delete from tsk_files where data_source_obj_id=<obj_id> and parent_path = '/$CarvedFiles/' and extension = 'mft';

Just deleting mft files I removed 4.1 million files. After removing some other extensions (as dll, exe, xml, sqlite) I'm now down to a total of about 140 thousand files. Extracting them took less than half an hour. This is a real time saver, but having to hack the autopsy case database is not exactly what I call elegant.

APriestman commented 4 years ago

The next release will have the option to enter extensions that should be excluded.

ctrlaltca commented 4 years ago

That's an awesome news, thank you!

ctrlaltca commented 3 years ago

I can confirm this is working just fine. I just re-executed a carving+extraction on a test ntfs disk excluding the most offending extensions (mft, txt) and the results are good:

bcarrier commented 3 years ago

@ctrlaltca What do you mean by 28 hours extracting? Is that the time it took for all of the 1.6 million files to go down the pipelines? Or the time that it took for Autopsy to add the 1.6 million files into the database? Both?

ctrlaltca commented 3 years ago

After the photorec ingest module completes, I look for the $CarvedFiles directory inside the main partition of the disk. Right click on the $CarvedFiles directory (containing 1.6 million files), click on "Extract file(s)", choose a destination directory, and let it extract the files; this is the step taking so much time.

Btw, let me thank you for the awesome Autopsy/DFIR trainings that I enjoyed a lot attending to.