Open qkaiser opened 2 years ago
Please note that compression bombs only works with implementation that recursively extract the archive content, which technically we don't. I still think this is a good enhancement / hardening of unblob to protect against these files. I'll do some tests locally and get back with some notes :)
Instead of checking the extracted file size for the content of a single compressed archive, we should keep an eye on the overall extracted size.
With the proposed approach, we could run into issues if a compressed archive has nested archives that are below the threshold - e.g., 42.zip only extracts 530kb in the first loop, but if we continue recursively, we will eventually extract a ton of 4.3gb files, which by themselves may fly below the radar but still fill up the disk.
I'd suggest to add an option (e.g., --max-extraction-size), which would override a reasonable default value (e.g., 100gb). Alternatively, we could also monitor available disk space and abort once we have used e.g., 90% of the available free disk space.
It is very hard to come up with a general solution for this. I'd suggest extraction to a separate volume or use quota to mitigate this issue.
I agree with the recommendation on separate volume and using quota. I'm sure we can provide sound recommendations on that subject in our documentation.
I was toying with the idea of implementing a "best effort" protection against zip bombs by collecting size information from StatReport
.
A basic implementation is available on this branch: https://github.com/onekey-sec/unblob/compare/main...zip-bomb
I'm saying "best effort" because the smallest unit we can work on is a Task
, so the increase in size is computed only after a file has been processed and decompressed/extracted.
I got good results against zip bombs with this, the only thing that is missing is a proper cancellation/cleanup of running processes (marked as TODO
in the code).
There's no urgency to it, but I would be glad if anyone from the team could have a look and share their thoughts.
I also had a weird idea: writing some wrapper that can be LD_PRELOAD-ed that captures write
calls, tracks the total disk consumption and aborts when a limit is reached.
This could also be used to wire in files created by extractors to our reports in real-time.
This would go against our attempt at supporting OSX (or at least Apple M1/M2).
Another idea I had was to launch a "watcher process" registering inotify callbacks on the extracted directories recursively, killing watched processes once the size limit is reached. However inotify only exists on Linux, and the vast majority of inotify libraries in Python sucks (although not super complex to write a modern one).
This would go against our attempt at supporting OSX (or at least Apple M1/M2).
There is an equivalent of LD_LIBRARY_PATH
on mac, but for sure it would complicate things.
We should check the extracted file size during chunk parsing to avoid filling up the disk when extracting malicious nested archives.
Samples can be found here: https://www.bamsoftware.com/hacks/zipbomb/
For zip bombs, a check can be implemented in
is_valid
:Something similar to this could work:
I'll check if similar behavior (ie. "let's fill the whole disk") can be triggered with other formats.