ouch-org / ouch

Painless compression and decompression in the terminal
https://crates.io/crates/ouch
Other
2.18k stars 75 forks source link

Support decompressing stdin. #692

Closed rcorre closed 1 week ago

rcorre commented 1 month ago

Fixes #687.

If "-" is passed as a filename, decompress data from stdin.

Currently --format must be passed as well, but as a next step, we could try to infer the format from magic numbers.

As stdin is not connected to the terminal, we cannot prompt for Y/N when warning about decompression in memory, for e.g. zip. Just default to No, and require passing "-y" in these cases.

For zip, we have to buffer the whole stream in memory to seek into it, just as we do with a chained decoder like .zip.bz.

The rar format requires an actual file (not an impl Read), so we write a temp file that it can decode.

When decoding a single-file archive (e.g. file.bz), the output filename is just -, since we don't know the original filename. I had to add a bit of a hack to the tests to work around this. Another option would be to interpret "-d" as a destination filename in this case.

When decoding a multi-file archive, I decided to unpack directly into the destination directory, as this seemed like a better experience than adding a top-level "-" folder inside the destination.

marcospb19 commented 1 month ago

Thanks for such a detailed description of every choice you made, that helps a lot.

This is a very good PR.

When decoding a single-file archive (e.g. file.bz), the output filename is just -, since we don't know the original filename. I had to add a bit of a hack to the tests to work around this. Another option would be to interpret "-d" as a destination filename in this case.

When decoding a multi-file archive, I decided to unpack directly into the destination directory, as this seemed like a better experience than adding a top-level "-" folder inside the destination.

Those are indeed weird cases to deal with.

Having an output file named - is weird so what about something like "stdin-output.ext1.ext2", "output.ext1.ext2" or "ouch-output.ext1.ext2"? (any suggestions for names of that?)

About unpacking in the current folder instead of using the smart_unpack functionality, due to lack of a better name for the directory, I think this falls in the same suggestion as the above, if we can have an intuitive name like "stdin-output", then suddenly it makes more sense to unpack it in a folder like usual.

What do you think?

rcorre commented 1 month ago

Thanks for taking a look!

Having an output file named - is weird so what about something like "stdin-output.ext1.ext2", "output.ext1.ext2" or "ouch-output.ext1.ext2"? (any suggestions for names of that?)

I'm fine with either of those, but I'm not sure where the extensions would come from.

About unpacking in the current folder instead of using the smart_unpack functionality, due to lack of a better name for the directory, I think this falls in the same suggestion as the above, if we can have an intuitive name like "stdin-output", then suddenly it makes more sense to unpack it in a folder like usual.

As a consumer, I feel like the most convenient behavior is:

  1. ouch d --format tgz -d out < example.tgz will unpack example directly to the directory out
  2. ouch d --format gz -d out < example.gz will decompress example the file out

That being said, I get that it's weird for stdin to change the behavior of -d, so I don't feel strongly about this. I can live with an extra layer of directory -- it's still way easier than remembering how to use tar :laughing:

marcospb19 commented 1 month ago

I'm fine with either of those, but I'm not sure where the extensions would come from.

It would come from --format, but actually that could be done later, I can create an issue with the suggestion after we merge this.

That being said, I get that it's weird for stdin to change the behavior of -d, so I don't feel strongly about this.

Hahaha yeah that's what I was thinking, if we add special cases for the behavior of -d, it makes it harder to document and/or predict what Ouch is going to do.

It's also hard to remember why after some years.

So can you change it to write to a "stdin-output" (or other name?) file and folder, respectively, for the single-file and multi-file archive? Then we think about extensions later.

(The problem with inferring the extension from --format is that the input stdin might already have an extension, in that case, adding more messes it up.)

rcorre commented 1 month ago

aarch/musl build failures seem unrelated:

 /build/musl-cross-make/build/local/aarch64-linux-musl/obj_musl/../src_musl/src/math/frexpl.c:16: undefined reference to `__multf3'
marcospb19 commented 2 weeks ago

Can't believe 3 weeks went by, sorry, time flies sometimes.

I'll try to get back to this this weekend.

rcorre commented 2 weeks ago

No rush, I know how it goes. Thanks for checking in!

marcospb19 commented 1 week ago

Alright so I need to dedicate a bit of time to figure out why CI failed randomly, but I don't think this PR needs to wait for it.

Thank you so much for the contribution! :) This is a great one.