thejoshwolfe / yazl

yet another zip library for node
MIT License
341 stars 45 forks source link

RE: API to predict final zipfile size #71

Closed sntran closed 1 month ago

sntran commented 2 years ago

This is a follow up of #1 .

I see that we can get the final zipfile size in the callback to .end. However, I would like to know that size way before adding files, so I can pass it along the pipeline before creating the archive.

Would it be possible to add another API to return the predicted size for a list of inputs whose size is provided?

For example:

yazl.size(
  fs.readdirSync("my/folder", { withFileTypes: true }).map(dirEntry => {
    return {
       ...dirEntry,
      size: fs.statSync(dirEntry.name).size,
    };
  }),
  {
    compress: false,
  },
);

Basically, it takes a list of file-like entries, with required name and size properties, and returns the final zipfile size.

It's fine to return -1 when the size can't be determined, but when it does (such as the size option is passed along the stream), the size should be computed. It's nicer that yazl exposes this number so that the calculation adheres to its method of archiving.

pklapperich commented 2 years ago

Input file size isn't sufficient to predict the size of the output file. A file that's random data or already compressed will often not compress at all and sometimes the resulting zip file is larger than the inputs. Text files usually compress a lot. The only time you'd get correct predictions is when you create a zip file with compression disabled.

I'm not sure, but I think browsers will truncate if you tell the browser "it's 10,000 bytes" but then actually give it 10,005 bytes". I'm not sure that anything bad happens if the download terminates early, so maybe it's safe to "predict" there's no compression and the user will just think it's going to take a lot longer than it is until it ends abruptly.

This package doesn't not appear to be maintained.

thejoshwolfe commented 1 month ago

hi @sntran . sorry for the delayed response.

The usecase you're describing sounds like what the finalSizeCallback is designed for already. I added a paragraph to the documentation describing a usecase of emitting the Content-Length header in a web server before piping any of the archive contents.

I think the name finalSizeCallback is causing a lot of confusion; the name sounds like it would be called when everything is done, when really it's called when all the entries are queued up and the metadata is loaded; i think that's probably a poor naming choice on my part.