Excessive memory consumption processing large E01 images

wladimirleite commented 2 years ago

This started with @hugohmk trying to process two large disks (10 TB and 8 TB), with a lot of videos. He faced out of memory issues, but eventually managed to process them using the "--continue” option. Both disk images are in E01 format and very large (~8.5 TB and ~7.1 TB). The number of items (discovered by tsk_loaddb) in these disks is not that large (< 1M).

I ran several tests to identify the root cause of the problem and was able to reproduce the issue processing the 8 TB disk, with IPED 3.18.13 and 4.0.0 (master) versions.

Monitoring the memory used with VisualVM, it turned out that there was still plenty of free memory for the JVM when the OOM error was triggered, but the physical memory of my machine was exhausted. I was using robustImageReading = true and numImageReaders = auto (which means 12 process in my configuration). Each SleuthkitServer process reached ~6 GB, which is a total of ~72 GB, plus IPED memory (~16 GB), SO and a few other things running consumed all 96 GB available. Note that the memory used by those SleuthkitServer processes grew slowly, roughly in a linear way.

I believe it is possible to reproduce the issue, processing a large E01 image, with all "common" processing options disabled, except processFileSignatures, and indexTempOnSSD disabled. I modified SignatureTask, to just read the first byte of the item, to be sure that the signature detection was not related to the memory issue, but that is not necessary, as the memory growth happened regardless of this change.

Reducing the number of image readers to 4, helped a bit. In that test, memory of each SleuthkitServer reached ~10 GB, and the processing finished successfully. Using a single image reader (numImageReaders = 1), the memory of the only SleuthkitServer process reached ~22 GB. Finally disabling robustImageReading, the main java process reached ~46 GB (I was using 24 GB as the limit to the JVM, so roughly the same result of previous test, but the memory consumption was in the main IPED process). One conclusion here was that the issue is not related to using these external processes (robustImageReading).

I made other tests with different E01 files (not as large, but still large >= 1 TB), and observed the same behavior, but the memory growth was smaller (reaching ~2 GB x 12 processes in some cases). Running similar tests with DD images, the memory consumed by the SleuthkitServer processes was constant, with no meaningful growth. So, the issue seems to be related to E01 processing.

I tried using LIBEWF without the IPED’s patches (that uses a cache to speed up reading). The memory issue was the same, so it is not related to IPED’s patches.

Finally, @aberenguel helped a lot, and tested with the latest version of LIBEWF, which finally solved the issue. Memory of SleuthkitServer processes did increase, but not that much, reaching only ~512 MB (which I believe is expected). To use this newer version, there is a single method that is called by The SleuthKit and its name changed, so @aberenguel had to rename it.

I found several issues related to memory consumption in LIBEWF project: https://github.com/libyal/libewf/issues/38, possibly related to the one described here. Most of them were fixed after 2014 (which is the version currently used by IPED).

@lfcnassif, I know you are already working with several issues, but when you get some time, please try to reproduce this issue. Basically, processing a very large E01, with only signature detection enabled, and observe the memory used by SleuthkitServer processes. If you do manage to reproduce it, we can later discuss solutions (one of the options would be using a newer version of LIBEWF).

lfcnassif commented 2 years ago

We already have a comment in our wiki related to this (https://github.com/sepinf-inc/IPED/wiki/Troubleshooting) in second paragraph from bottom to top, OOME section. But I thought these kind of issues were related to TSK memory leaks, not LIBEWF! Thank you very much guys for all your tests!

Sleuthkit project has several issues to upgrade to latest LIBEWF, but @joachimmetz said it is still experimental and should not be used for production, although that was years ago: https://github.com/sleuthkit/sleuthkit/issues/642 https://github.com/sleuthkit/sleuthkit/pull/1052

Not sure how is the library stability now...

lfcnassif commented 2 years ago

I reported the issue to original project.

I tried using LIBEWF without the IPED’s patches

Could you confirm the exact original version used?

wladimirleite commented 2 years ago

We already have a comment in our wiki related to this (https://github.com/sepinf-inc/IPED/wiki/Troubleshooting) in second paragraph from bottom to top, OOME section.

Disabling robustImageReading does help (it is the same effect of using numImageReaders=1). It seems that LIBEWF (the olde version) stores some data related to E01 structure. When there are more readers, they randomly request parts of the E01, so the memory consumed by each process is lower than using a single process, but the total amount of memory (sum of all processes) is larger, as there is some intersection between data stored by each process.

Sleuthkit project has several issues to upgrade to latest LIBEWF, but @joachimmetz said it is still experimental and should not be used for production, although that was years ago:

Yes, I noticed that, but there is no signal about when (or if) this "experimental" mark will be removed. As far as I know, @aberenguel has been using the latest version of LIBEWF for a while.

wladimirleite commented 2 years ago

I tried using LIBEWF without the IPED’s patches

Could you confirm the exact original version used?

I used the one that came with IPED 3.5.1 (internally it is marked as version 20130416, but I am not sure this information is reliable). It is modification date is 2014/02/03.

lfcnassif commented 2 years ago

I'm trying to reproduce here with patched libewf-legacy and 20140812. Just one quick question, those E01 images were segmented?

wladimirleite commented 2 years ago

Just one quick question, those E01 images were segmented?

No, just a single E01 file.

lfcnassif commented 2 years ago

Just processed a 1,7TB segmented E01 image (~900 segments) with ~1.1M items from a 2TB HDD, with master and current patched libewf, fastmode + signature detection, and I didn't manage to reproduce. Max mem reached by image reading processes, although increasing slowly, was ~480MB. I'll convert the image to one single segment and look for a bigger sample image...

Was one of the DD images used in tests the same as the problematic 8-10TB E01 after converting E01->DD? Was @aberenguel able to reproduce the issue with old libewf (patched or not)? I know he is a Linux user, is he using TSK-4.6.5-patched (included in iped) or a newer version? What is the FS of triggering images (TSK used to have such memory issues with HFS)? Sorry for so many questions, just trying to be sure it is not a TSK issue...

wladimirleite commented 2 years ago

Was one of the DD images used in tests the same as the problematic 8-10TB E01 after converting E01->DD?

Yes, I tested with a DD created from the E01 and a couple of other 2TB DDs I had here. I also tested with other E01 images from other cases (unrelated to these two large images), and observed the same behavior (in a smaller scale, because the images were smaller).

Was @aberenguel able to reproduce the issue with old libewf (patched or not)?

He was already using the newest LIBEWF. He tried to build the old one, but there were several issues, and we had to give it up (at least for today).

I know he is a Linux user, is he using TSK-4.6.5-patched (included in iped) or a newer version?

Yes, TSK-4.6.5-patched, with the most recent version of LIBEWF (just renaming one method to match the name used by TSK).

What is the FS of triggering images (TSK used to have such memory issues with HFS)?

NTFS, nothing particularly unusual, except they are large images.

Sorry for so many questions, just trying to be sure it is not a TSK issue...

No problem! I am also not 100% sure. And that is why I posted the details about the tests so you can run your own tests and see if we can find a conclusion.

lfcnassif commented 2 years ago

Yes, I tested with a DD created from the E01 and a couple of other 2TB DDs I had here.

Well, given this and that latest libewf fixed the issue using the same TSK version (on another OS right?), seems an issue with libewf-legacy. While I try to reproduce the issue here, I can try to compile latest libewf for Windows x64 and send the dll to you, if you have the time to verify if it fixes the issue on your Windows system, that I believe was used for most tests...

wladimirleite commented 2 years ago

All tests that I mentioned used Windows. @aberenguel built a Windows x64 version for me of the latest LIBEWF (which I used in the last test that apparently solved the memory issue). I believe he used MinGW to build. If you build in a different way (like Visual Studio), just send me the DLL and I can test it here.

lfcnassif commented 2 years ago

Great! I understood the fix was verified by him on his Linux system. But I've just found another single 2TB E01 image and managed to reproduce, several external processes using 2GB of memory now! I'll try to compile a x64 win libewf 2020 version and test it.

wladimirleite commented 2 years ago

Great! If you managed to reproduce it I am relieved!

aberenguel commented 2 years ago

I'm trying to reproduce here with patched libewf-legacy and 20140812. Just one quick question, those E01 images were segmented?

I've just compiled (Ubuntu 21.10) the libewf-legacy tag 20140812 and could reproduce the scenario @tc-wleite mentioned (with the same E01). The picture shows the java subprocesses that consumed 2GB very fastly.

aberenguel commented 2 years ago

BTW, I had to make some modifications in libewf-legacy 20140812 in order to make it read the big E01 file.

libcdata_definitions.h: LIBCDATA_ARRAY_ENTRIES_MEMORY_LIMIT increased to 1GB (not tested with lower values)
libmfdata_definitions.h: LIBMFDATA_ARRAY_ENTRIES_MEMORY_LIMIT ncreased to 2GB

And I experienced some segmentation faults using tsk_loaddb (libewf-legacy 20140812) with this E01. After some debugs with gdb I've found some non initialized variables. I submitted the fix in https://github.com/lfcnassif/sleuthkit-APFS/pull/1

lfcnassif commented 2 years ago

several external processes using 2GB of memory now

Updates:

after monitoring that whole processing, each external image reading process used up to 3.5GB of memory using our patched libewf-legacy with the 2 TB image I got (on Windows);
same results/memory leak when using libewf-legacy 20140812 (on Windows)
after compiling libewf-20201230 using MS VC 2015 (and dealing with a bzip2 linking issue...), each external image reading process used up to 400MB of memory (on Windows);

I'll convert this triggering image to 2GB limited segments to check if it helps (because the other segmented image I got didn't reproduce this).

So, seems upgrading is a good option! I would also like to hear from @gabrieldmf because he is a heavy Linux user and maybe used a recent libewf version for some while, to check if he had bad surprises...

And if @fmpfeifer could apply his cache patch to libewf20201230 and check if it helps decoding HFS images, that would be great! If I remember, he got about 10x speed up when decoding HFS E01 images, seems TSK reads the same HFS structures several times... External processes help to read file contents in parallel, but FS decoding is done by single threaded tsk_loaddb. Not sure if that patch will help latest libewf or recent tsk versions...

aberenguel commented 2 years ago

As additional information, I compiled a merged version of sleuthkit-APFS + sleuthkit-4.6.7 with libewf-legacy 20140812. It did not fixed. Same behaviour.

lfcnassif commented 2 years ago

PS: @tc-wleite here are the libewf-20201230 x64 DLL (patched to use the old method name) and some dependencies compiled using MS VS 2015 if you want to double check, but don't worry, now I'm pretty sure we can conclude it fixes the issue. Release.zip

going to sleep...

joachimmetz commented 2 years ago

sleuthkit-4.6.7

please do no use such an old version of the Sleuhtkit many issues have been addressed since

so what is the issue? high memory consumption with large E01 files with the legacy version?

wladimirleite commented 2 years ago

@tc-wleite here are the libewf-20201230 x64 DLL (patched to use the old method name) and some dependencies compiled using MS VS 2015 if you want to double check.

Just ran a complete test with the large (~7.1 TB) E01 and another E01 (~1.5 TB), and everything worked fine with the DLL you shared.

lfcnassif commented 2 years ago

please do no use such an old version of the Sleuhtkit many issues have been addressed since

Hi @joachimmetz, first thank you for your time. I'm aware about the many many issues you and others fixed in TSK (thanks!), but when trying to upgrade on #771, I found important APFS decoding regressions when comparing with our fork from BlackBag's original implementation. Hopefully, I'll have time to investigate it in one or 2 weeks...

so what is the issue? high memory consumption with large E01 files with the legacy version?

Yes, @tc-wleite reported a single process using libewf-legacy used about ~22GB of memory when decoding a 8TB E01 image. Because we use many processes to read image contents, that exhausted the machine 96GB of RAM. When using the latest libewf 20201230, the memory usage decreased to ~512MB. Reading a converted DD image, memory usage was normal (up to ~500MB I think - @tc-wleite could you confirm? - without using libewf). He also observed the same behavior with other smaller E01 images, but in a smaller scale. I've just run again a similar test with a smaller 2TB image, but using one libewf-legacy image reading process, and memory usage of that process was 7.5GB. Then, when using latest libewf 20201230, it was 400MB. I think libewf-legacy maybe is not expected to use that amount of memory with a 8TB E01, nor to increase the memory usage linearly (roughly) with the E01 image size, so we think it could be a memory leak...

joachimmetz commented 2 years ago

I found important APFS decoding regressions when comparing with our fork from BlackBag's original implementation.

If you rely on TSK APFS support I would address those issues in the SleuthKit including other APFS issues like https://github.com/sleuthkit/sleuthkit/issues/2641, BlackBag is not maintaining the implementation

When using the latest libewf 20201230, the memory usage decreased to ~512MB.

The legacy version can be memory hungry on how the chunk table is kept in memory, especially with >1T images. Where the experimental version only keeps part of it in memory. Can you please test with tools such as valgrind to confirm if it is a leak or not.

lfcnassif commented 2 years ago

The legacy version can be memory hungry on how the chunk table is kept in memory, especially with >1T images. Where the experimental version only keeps part of it in memory. Can you please test with tools such as valgrind to confirm if it is a leak or not.

Sorry @joachimmetz, I misused the 'leak' word. It could be just an inefficient use of memory resources. I'm not an experienced C programmer and never used valgrind, but I'll try to use it to be sure if it is a leak or not when I have some time available and report back. Thanks again for your time.

fmpfeifer commented 2 years ago

several external processes using 2GB of memory now

Updates:

after monitoring that whole processing, external image reading processes used up to 3.5GB of memory using our patched libewf-legacy with the 2 TB image I got (on Windows);

same results/memory leak when using libewf-legacy 20140812 (on Windows)

after compiling libewf-20201230 using MS VC 2015 (and dealing with a bzip2 linking issue...), external image reading processes used up to 400MB of memory (on Windows);

I'll convert this triggering image to 2GB limited segments to check if it helps (because the other segmented image I got didn't reproduce this).

So, seems upgrading is a good option! I would also like to hear from @gabrieldmf because he is a heavy Linux user and maybe used a recent libewf version for some while, to check if he had bad surprises...

And if @fmpfeifer could apply his cache patch to libewf20201230 and check if it helps decoding HFS images, that would be great! If I remember, he got about 10x speed up when decoding HFS E01 images, seems TSK reads the same HFS structures several times... External processes help to read file contents in parallel, but FS decoding is done by single threaded tsk_loaddb. Not sure if that patch will help latest libewf or recent tsk versions...

Hi @lfcnassif, sorry for the delay. Is there a github repo with the libewf20201230 that I can apply my patches onto? I have the patches in this repo (last 4 commits): https://github.com/fmpfeifer/libewf_64bit

but this is libewf 20130416

lfcnassif commented 2 years ago

Hi @lfcnassif, sorry for the delay. Is there a github repo with the libewf20201230 that I can apply my patches onto? I have the patches in this repo (last 4 commits): https://github.com/fmpfeifer/libewf_64bit

but this is libewf 20130416

Hi @fmpfeifer, good to hear from you. Don't worry, we are on weekend! AFAIK we still don't have a fork, just the official libewf repo. Maybe submiting your patch as a PR to official libewf would be an option?

Since lastest TSK still uses the old libewf API, what path would be better? Patching TSK or LIBEWF method names? Possibly we will have to patch TSK to make it able to accept APFS passwords at least... I don't worry about Windows users since we distribute compiled libs, just about Linux users trying to mix versions...

lfcnassif commented 2 years ago

I've just run again a similar test with a smaller 2TB image, but using one libewf-legacy image reading process, and memory usage of that process was 7.5GB. Then, when using latest libewf 20201230, it was 400MB

Just a last update on tests, I converted this single ~1.9TB E01 image to about ~950 2GB EWF segments and run tests again:

single image reading process with libewf-legacy used ~2.3GB of memory
single image reading process with libewf-20201230 used ~900MB of memory

This is interesting because breaking the single 2TB E01 image in 2GB segments decreased the libewf-legacy version memory usage, but increased the newer libewf version memory usage. Anyway, the newer version used less memory in both scenarios.

lfcnassif commented 2 years ago

Is there a github repo with the libewf20201230 that I can apply my patches onto?

I've just created a fork here https://github.com/sepinf-inc/libewf if it helps.

joachimmetz commented 2 years ago

@lfcnassif can you create PR, I can take a look at merging the changes when time permits

lfcnassif commented 2 years ago

@lfcnassif can you create PR, I can take a look at merging the changes when time permits

Hi @joachimmetz, of course. @fmpfeifer was the original patch author. Currently he is traveling, but when he returns I'll check this with him. Basically his patch creates a limited cache for uncompressed data, so applications that read the same data ~~chunk~~ several times could get a great benefit. Do you think this cache would be a good addition to libewf?

joachimmetz commented 2 years ago

MRU chunks are already cached https://github.com/libyal/libewf/blob/3a0968ae9d1cb3ebab4eb1931a93c7f50387bbaf/libewf/libewf_chunk_table.c#L129

This is currently 8 chunks in size https://github.com/libyal/libewf/blob/3a0968ae9d1cb3ebab4eb1931a93c7f50387bbaf/libewf/libewf_definitions.h.in#L430

lfcnassif commented 2 years ago

Thanks @joachimmetz! Just to confirm, libewf-legacy also has such kind of cache for uncompressed data? I couldn't find those limits (like LIBEWF_MAXIMUM_CACHE_ENTRIES_CHUNKS) in the legacy version.

joachimmetz commented 2 years ago

I think it is this one https://github.com/libyal/libewf-legacy/blob/d35048cd97d2e192708988fc1424cd2ca2a00e76/libewf/libewf_handle.c#L2106

though this cache might behave a little bit different than MRU

joachimmetz commented 2 years ago

Caching is a tricky topic, can widely differ based on the source data and the use case

lfcnassif commented 2 years ago

that read the same data chunk

This can be wrong, maybe @fmpfeifer's cache is at a higher level, so maybe it can cache small pieces of data from different chunks much before reaching its limits, maybe he could clarify...

joachimmetz commented 2 years ago

Ok but that means the cache might better live in the TSK then, since the cache would/could also benefit other storage image formats.

lfcnassif commented 2 years ago

Yes, possibly, that's why I asked if it would be a good addition to libewf. Thanks @joachimmetz

joachimmetz commented 2 years ago

so that is debatable, if this is a cache that specifically benefits use of libewf with libtsk, then no, if this a cache that benefits more generic usage of libewf, then maybe, but then the question becomes why is it current cache not sufficient? So some data on cache hits and misses could be useful.

lfcnassif commented 2 years ago

@fmpfeifer do you still have the test image you used in benchmarks?

fmpfeifer commented 2 years ago

No, I don't. It was a long time ago..

lfcnassif commented 1 year ago

To use this newer version, there is a single method that is called by The SleuthKit and its name changed, so @aberenguel had to rename it.

@aberenguel what is the name of the method that needs to be renamed? I'll apply the patch to our libewf fork to easy Linux users life that would like to use the new libewf library.

joachimmetz commented 1 year ago

Note that the API of the experimental libewf version is not stable, changes you make now to make people's life easier might backfire in the future. Also the version is called experimental for a reason

lfcnassif commented 1 year ago

Thanks @joachimmetz, I see. Unfortunately another Linux user had OOM issues with libewf-legacy processing a large E01 file. I pointed him to this issue and he asked if we could apply the patch to a libewf fork to easy linking to the experimental version, like we do for our sleuthkit fork. I understand a fork would put more maintanance work on our side and that is not the ideal solution...

joachimmetz commented 1 year ago

they could just use ewfmount of the experimental version in the meantime.

aberenguel commented 1 year ago

To use this newer version, there is a single method that is called by The SleuthKit and its name changed, so @aberenguel had to rename it.

@aberenguel what is the name of the method that needs to be renamed? I'll apply the patch to our libewf fork to easy Linux users life that would like to use the new libewf library.

--- a/tsk/img/ewf.cpp
+++ b/tsk/img/ewf.cpp
@@ -69,3 +69,3 @@ ewf_image_read(TSK_IMG_INFO * img_info, TSK_OFF_T offset, char *buf,
 #if defined( HAVE_LIBEWF_V2_API )
-    cnt = libewf_handle_read_random(ewf_info->handle,
+    cnt = libewf_handle_read_buffer_at_offset(ewf_info->handle,
         buf, len, offset, &ewf_error);

aberenguel commented 1 year ago

In the case I changed TSK

sepinf-inc / IPED

Excessive memory consumption processing large E01 images #993