google / ripunzip

Other
161 stars 17 forks source link

Feature idea: support expansion to stdout #27

Open leahneukirchen opened 1 year ago

leahneukirchen commented 1 year ago

For streaming dataflow pipelines, supporting expansion to stdout could be useful.

adetaylor commented 1 year ago

Thanks for the suggestion.

I want to keep ripunzip very focused on the specific job it was designed for, which is unzipping zip files containing lots of contents in parallel.

I would be open to accepting a pull request to add this feature, but with a pretty high bar. I'd need direct numeric evidence that this is a substantial time saving for some folks over using standard unzipping tools. The only possible time saving here is from doing the fetch & unzip in parallel, so the best theoretical speedup is 2x. That's a lot less than the 30x+ speedup which is possible when unzipping lots of files from the same zip file. But, 2x could still be useful for some folks - I'd want to hear from them and to be convinced it was worth the slight deviation from ripunzip's original mission.

leahneukirchen commented 1 year ago

I had the impression that mere decompression of one stream is multi-threaded too, so can use multiple cores (ala pigz or pixz).

If that's not the case, it's probably not worth it.

adetaylor commented 12 months ago

FWIW I partially withdraw my earlier comment - now that we fetch from URIs, sometimes ripunzip is useful when unzipping just a single file, so a stdout option would be useful.

evils commented 8 months ago

a to me obvious advantage would be the ability to start using the output in a pipe while the download is still happening without having to save a massive decompressed output (though i am unsure if this would work well with tar's --sort=name for my particular case (deterministic repacking))

i was quite surprised ripzip doesn't support this, not even with --output-directory /dev/stdout