Open Fuuzetsu opened 5 months ago
Not all decompression software will correctly handle a ZIP-file entry whose size, compressed size and checksum aren't known at the time the entry's local-file header is transmitted. Therefore, supporting stream output when it isn't at least possible to buffer and flush the individual compressed files will mean a separate ZipWriter type and so on. I'll try to implement this, but it'll probably take a while.
Not all decompression software will correctly handle a ZIP-file entry whose size, compressed size and checksum aren't known at the time the entry's local-file header is transmitted.
Just to clarify, I think you're saying that we need to know things like compressed size up front, which means compressing the file to somewhere. Presumably what happens at the moment is space is left for the header, then compression happens and is appended and then it seeks back to the header and fills in the information it now knows: sizes, checksum etc. Right?
I wonder what tools like zip
on Linux do…
Thanks for looking into this.
What zip
does on Linux if the output isn't seekable is to set the "data descriptor" flag, which indicates that the CRC32 checksum, size and compressed size will be in a footer after the file.
Per https://github.com/Pr0methean/zip/blob/ffa7772cc362049308193e133744a0120092f0c2/src/read.rs#L1239, the read_zipfile_from_stream
method is one example of software that doesn't support files that use the data descriptor. Per https://github.com/Pr0methean/zip/blob/ffa7772cc362049308193e133744a0120092f0c2/src/write.rs#L691, ZipWriter
doesn't currently use data descriptors.
I think the only solution will be to give ZipWriter
a new type parameter S: Seek
, a new field seeker: Option<S>
, and a second constructor for when W: S
. S
would be Infallible
if the output didn't implement Seek
. (If using Infallible
somehow doesn't work, we can create a custom impossible type, or wait for https://github.com/rust-lang/rust/issues/35121 to be stabilized and then use !
.)
In case it's useful at all, an implementation of this was submitted for zip-old
last year here: https://github.com/zip-rs/zip-old/pull/383. It uses the same ZipWriter
type, but wraps the inner writer with a new MaybeSeekable
type.
I don't quite understand the current discussion but is this comment still accurate? https://github.com/ofek/pyapp/blob/fc356202f49cb79f5e92294e4ed29d33b50442b0/src/distribution.rs#L217-L225
Essentially, there is an option to embed an archive inside the binary itself but I must save to a temporary file in order to extract. Is it possible to directly extract now given a stream?
Extracting without Seek
has been possible using the read_zipfile_from_stream
method (https://github.com/zip-rs/zip2/blob/master/src/read.rs#L1298) since version 0.4.1 (released 2018-06-20): https://github.com/zip-rs/zip2/commit/38d16998539f96ea9641d6cab6fbf7fdb2d0b07e It still has a few limitations, but should work for almost any unencrypted ZIP file that this crate can produce. Consider forking and finishing https://github.com/zip-rs/zip2/pull/70 if you need to read a ZIP file in streaming mode while it's being written in streaming mode.
It appears that the
W
inZipWriter<W>
must always beWrite + Seek
. This means that for example, one can't write the zip tostdout
.It seems that
Seek
should only be required on the methods that actually make use of the functionality. As far as I understand, it should be perfectly possible to not haveSeek
if we're not doing anything exotic. In my use-case, I just want to add a set of files sequentially and then I'm done.A poor work-around is to use a temporary file or a buffer and write to that, but that obviously wastes space/memory and makes the process non-streaming.
I'm not too privy to zip internals so correct me if I'm wrong. I'm pretty sure that
zip
program on Linux can write to stdout just by passing-
as an argument. You can then send the data over the network directly without an intermediate, so presumably it's possible for the simple case.