Closed qkaiser closed 8 months ago
Copying the exchange I had with @e3krisztian outside Github:
@e3krisztian:
I saw the gzip changes, let's talk about it tomorrow. The problem is with these output files:
tests/integration/compression/gzip/__output__/multi-volume-split-then-gzip.tar_extract/one.txt.gz.aa_extract/one.txt.aa
tests/integration/compression/gzip/__output__/multi-volume-split-then-gzip.tar_extract/one.txt.gz.aa_extract/one.txt.ab
... they should not be like this - it is not a multifile this way. The expected output would betests/integration/compression/gzip/__output__/multi-volume-split-then-gzip.tar_extract/one.txt.gz_extract/one.txt
@qkaiser:
I looked at it and here's the problem:
- we provide the first path to 7z through MultiFileCommand, this is because 7z is smart enough to detect multi-volume archives or compressed streams
- if we wanted to decompress "split then compressed" multi-volumes, we can provide a wildcard to 7z by adapting MultiFileCommand so that it runs something like this: 7z x -p -y 'mv.7z*' -o/tmp/out
- the problem with this wildcard approach is that it blocks 7z from working with legit multi-volume archives, because it will consider each matching file independently if we provide a wildcard
So we can't have both. I think having split then compressed multi-volume is an edge case and should only be handled when we observe one. Until then it will still be handled, but each file decompressed independently, without causing issues.
It's possible to create multi-part gzip with
split
, which will create multiple gzip compressed files with a 'aa', 'ab', 'ac', .. suffix.We match on
.gz.aa
in a directory, get all the files with same name but different suffix, order them and feed them to 7z.This is very close to what we were already doing with multi-part 7zip archives.