sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
949 stars 219 forks source link

Optimize processing small files #1958

Closed lfcnassif closed 11 months ago

lfcnassif commented 11 months ago

When indexTempOnSSD = true TempFileTask creates temp files for most files < 1GB size, except for subitems (already in the case data storage) and carved files whose parent already has a temp File. This avoids decompressing the same file multiple times from E(x)01 evidences and also caches data from other image types in network shares.

For small files, we can cache the content on memory, avoiding unneeded writes to and reads from the temp directory for items that can be processed without a temp File.

What file size limit would be reasonable to keep on memory while processing it (after it was taken from the queue)? We already use a large buffer up to 8MB in Item.getBufferedInputStream() method, which would use up to 400MB of memory in a 50 threads machine.

lfcnassif commented 11 months ago

Caching small subitems on memory would also avoid uncompressing them multiple times from the internal case storage, where they are compressed. In the past I also tested creating uncompressed temp files for them, but I didn't come to a conclusion if it improved processing speed or not, since creating temp files has a cost. But keeping them on memory should speed up things a bit.

lfcnassif commented 11 months ago

Until now, I didn't get clear differences with this approach, tested with a huge UFDR, one small and one medium size E01. I'll repeat tests using a non SSD temp disk and maybe more evidences...

lfcnassif commented 11 months ago

Conclusions after lots of tests on a few evidences (03 E01s and 02 UFDRs):

A few thoughts:

So, I'll merge the proposed change put together with #1224.

Currently the memory buffer limit is 8MB, we may decrease it if someone thinks it is too large, please let me know.