onekey-sec / unblob

Extract files from any kind of container formats
https://unblob.org
Other
2.15k stars 80 forks source link

feat(handler): add multi-part gzip handler. #689

Closed qkaiser closed 8 months ago

qkaiser commented 9 months ago

It's possible to create multi-part gzip with split, which will create multiple gzip compressed files with a 'aa', 'ab', 'ac', .. suffix.

We match on .gz.aa in a directory, get all the files with same name but different suffix, order them and feed them to 7z.

This is very close to what we were already doing with multi-part 7zip archives.

qkaiser commented 8 months ago

Copying the exchange I had with @e3krisztian outside Github:

@e3krisztian:

I saw the gzip changes, let's talk about it tomorrow. The problem is with these output files: tests/integration/compression/gzip/__output__/multi-volume-split-then-gzip.tar_extract/one.txt.gz.aa_extract/one.txt.aa tests/integration/compression/gzip/__output__/multi-volume-split-then-gzip.tar_extract/one.txt.gz.aa_extract/one.txt.ab ... they should not be like this - it is not a multifile this way. The expected output would be tests/integration/compression/gzip/__output__/multi-volume-split-then-gzip.tar_extract/one.txt.gz_extract/one.txt

@qkaiser:

I looked at it and here's the problem:

  • we provide the first path to 7z through MultiFileCommand, this is because 7z is smart enough to detect multi-volume archives or compressed streams
  • if we wanted to decompress "split then compressed" multi-volumes, we can provide a wildcard to 7z by adapting MultiFileCommand so that it runs something like this: 7z x -p -y 'mv.7z*' -o/tmp/out
  • the problem with this wildcard approach is that it blocks 7z from working with legit multi-volume archives, because it will consider each matching file independently if we provide a wildcard

So we can't have both. I think having split then compressed multi-volume is an edge case and should only be handled when we observe one. Until then it will still be handled, but each file decompressed independently, without causing issues.