vasi / pixz

Parallel, indexed xz compressor
BSD 2-Clause "Simplified" License
698 stars 61 forks source link

concatenation of *xz files and then decompression using pixz #89

Closed justinmccrary closed 3 years ago

justinmccrary commented 3 years ago

I have many, many files which were compressed as part of a long-standing real-time loop using pixz.

Many are big and so I want to decompress them individually. But others are small and I expect efficiencies from concatenating them and then pass that concatenation to pixz for decompression.

So at root I am looking to do something like: cat small_files*.txt.xz > file.txt.xz pixz -d -p 4 > file.txt < file.txt.xz

Even better would be something like pixz -d -p 4 > file.txt < small_files*.txt.xz

vasi commented 3 years ago

This is an interesting bug! Here's what's happening:

So this bug only happens when there's small, concatenated files. Fun!

vasi commented 3 years ago

Should be fixed, please give it a try. You can do cat f1.xz f2.xz f3.xz | pixz -d > outut.txt.

Note that pixz isn't really better than xz for compression/decompressing lots of individual small files. We can only really use parallelization with large files (including tarballs that contain lots of small files).