Closed lfcnassif closed 11 months ago
Caching small subitems on memory would also avoid uncompressing them multiple times from the internal case storage, where they are compressed. In the past I also tested creating uncompressed temp files for them, but I didn't come to a conclusion if it improved processing speed or not, since creating temp files has a cost. But keeping them on memory should speed up things a bit.
Until now, I didn't get clear differences with this approach, tested with a huge UFDR, one small and one medium size E01. I'll repeat tests using a non SSD temp disk and maybe more evidences...
Conclusions after lots of tests on a few evidences (03 E01s and 02 UFDRs):
A few thoughts:
indexTempOnSSD = true
, temp files for compressed files won't be created and this should help a lot;indexTempOnSSD = true
by mistake, keeping small files on heap and writing less to temp disk is better, I tested this with 01 UFDR and processing was 13% faster with the memory cache;So, I'll merge the proposed change put together with #1224.
Currently the memory buffer limit is 8MB, we may decrease it if someone thinks it is too large, please let me know.
When
indexTempOnSSD = true
TempFileTask creates temp files for most files < 1GB size, except for subitems (already in the case data storage) and carved files whose parent already has a temp File. This avoids decompressing the same file multiple times from E(x)01 evidences and also caches data from other image types in network shares.For small files, we can cache the content on memory, avoiding unneeded writes to and reads from the temp directory for items that can be processed without a temp File.
What file size limit would be reasonable to keep on memory while processing it (after it was taken from the queue)? We already use a large buffer up to 8MB in
Item.getBufferedInputStream()
method, which would use up to 400MB of memory in a 50 threads machine.