onekey-sec / unblob

Extract files from any kind of container formats
https://unblob.org
Other
2.18k stars 80 forks source link

Protect against malicious nested archives (aka zip bombs) #210

Open qkaiser opened 2 years ago

qkaiser commented 2 years ago

We should check the extracted file size during chunk parsing to avoid filling up the disk when extracting malicious nested archives.

Samples can be found here: https://www.bamsoftware.com/hacks/zipbomb/

For zip bombs, a check can be implemented in is_valid:

def is_valid(self, file: io.BufferedIOBase) -> bool:
        has_encrypted_files = False
        try:
            with zipfile.ZipFile(file) as zip:  # type: ignore
                for zipinfo in zip.infolist():
                    if zipinfo.flag_bits & ENCRYPTED_FLAG:
                        has_encrypted_files = True
            if has_encrypted_files:
                logger.warning("There are encrypted files in the ZIP")
            return True
        except (zipfile.BadZipFile, UnicodeDecodeError, ValueError):
            return False

Something similar to this could work:

with zipfile.ZipFile(file) as zip:  # type: ignore
    extracted_size = sum(e.file_size for e in zip.infolist())
    if extracted_size > SOME_CONSTANT:
        # bail out

I'll check if similar behavior (ie. "let's fill the whole disk") can be triggered with other formats.

qkaiser commented 2 years ago

Please note that compression bombs only works with implementation that recursively extract the archive content, which technically we don't. I still think this is a good enhancement / hardening of unblob to protect against these files. I'll do some tests locally and get back with some notes :)

flukavsky commented 2 years ago

Instead of checking the extracted file size for the content of a single compressed archive, we should keep an eye on the overall extracted size.

With the proposed approach, we could run into issues if a compressed archive has nested archives that are below the threshold - e.g., 42.zip only extracts 530kb in the first loop, but if we continue recursively, we will eventually extract a ton of 4.3gb files, which by themselves may fly below the radar but still fill up the disk.

I'd suggest to add an option (e.g., --max-extraction-size), which would override a reasonable default value (e.g., 100gb). Alternatively, we could also monitor available disk space and abort once we have used e.g., 90% of the available free disk space.

vlaci commented 2 years ago

It is very hard to come up with a general solution for this. I'd suggest extraction to a separate volume or use quota to mitigate this issue.

qkaiser commented 1 year ago

I agree with the recommendation on separate volume and using quota. I'm sure we can provide sound recommendations on that subject in our documentation.

however

I was toying with the idea of implementing a "best effort" protection against zip bombs by collecting size information from StatReport.

A basic implementation is available on this branch: https://github.com/onekey-sec/unblob/compare/main...zip-bomb

I'm saying "best effort" because the smallest unit we can work on is a Task, so the increase in size is computed only after a file has been processed and decompressed/extracted.

I got good results against zip bombs with this, the only thing that is missing is a proper cancellation/cleanup of running processes (marked as TODO in the code).

There's no urgency to it, but I would be glad if anyone from the team could have a look and share their thoughts.

vlaci commented 1 year ago

I also had a weird idea: writing some wrapper that can be LD_PRELOAD-ed that captures write calls, tracks the total disk consumption and aborts when a limit is reached.

This could also be used to wire in files created by extractors to our reports in real-time.

qkaiser commented 1 year ago

This would go against our attempt at supporting OSX (or at least Apple M1/M2).

Another idea I had was to launch a "watcher process" registering inotify callbacks on the extracted directories recursively, killing watched processes once the size limit is reached. However inotify only exists on Linux, and the vast majority of inotify libraries in Python sucks (although not super complex to write a modern one).

vlaci commented 1 year ago

This would go against our attempt at supporting OSX (or at least Apple M1/M2).

There is an equivalent of LD_LIBRARY_PATH on mac, but for sure it would complicate things.