Open leahneukirchen opened 1 year ago
Thanks for the suggestion.
I want to keep ripunzip
very focused on the specific job it was designed for, which is unzipping zip files containing lots of contents in parallel.
I would be open to accepting a pull request to add this feature, but with a pretty high bar. I'd need direct numeric evidence that this is a substantial time saving for some folks over using standard unzipping tools. The only possible time saving here is from doing the fetch & unzip in parallel, so the best theoretical speedup is 2x. That's a lot less than the 30x+ speedup which is possible when unzipping lots of files from the same zip file. But, 2x could still be useful for some folks - I'd want to hear from them and to be convinced it was worth the slight deviation from ripunzip
's original mission.
I had the impression that mere decompression of one stream is multi-threaded too, so can use multiple cores (ala pigz or pixz).
If that's not the case, it's probably not worth it.
FWIW I partially withdraw my earlier comment - now that we fetch from URIs, sometimes ripunzip is useful when unzipping just a single file, so a stdout option would be useful.
a to me obvious advantage would be the ability to start using the output in a pipe while the download is still happening without having to save a massive decompressed output
(though i am unsure if this would work well with tar's --sort=name
for my particular case (deterministic repacking))
i was quite surprised ripzip
doesn't support this, not even with --output-directory /dev/stdout
For streaming dataflow pipelines, supporting expansion to stdout could be useful.