thejoshwolfe / yazl

yet another zip library for node
MIT License
329 stars 44 forks source link

in-place zip editing support #38

Open mingyuan-xia opened 6 years ago

mingyuan-xia commented 6 years ago

We have been using yazl and yauzl for over a year and we very much like its stability and pure-JS-ness. Yet, we face quite a lot use cases of editing an zip in-place. For now, we did what's done in #30: create a temp file with yazl, open the original zip with yauzl, transport all entries (and data) to the temp file and overwrite the original one with the temp file. This has quite a few drawbacks:

  1. performance: memory, IO
  2. sometimes it would be tricky to get a temp file path (original_file+'.tmp' / global temp folder are both not reliable for some cases
  3. quite a lot pitfalls (on macOS, empty folder shall be preserved; externalAttributes blablabla) and lengthy non-reusable code snippet.

We are searching for equivalence of zip -d, zip old.zip new.file, zip -ur. Delete, add, update in-place. Can you share some thoughts?

thejoshwolfe commented 6 years ago

Editing existing zipfiles is a hard problem. Implementations are faced with lots of trafe offs that I've been trying to avoid. appending a file is fairly simple, but what about deleting the first file in the archive? Some possible approaches are:

And if a file is updated in place, it's compressed size will probably change, which means you have to deal with the above problem for that case too.

The most conservative and compatible way to make zipfiles is to pack everything closely to leave no unused space. That strategy means that in the worst case you're going to be recreating the whole archive anyway. On average, you're probably recreating half of the archive.

The benefits of supporting in place edits seem very small to me, and the complexity of implementing the feature is very high. I don't plan to support it.

We are searching for equivalence of zip -d, zip old.zip new.file, zip -ur.

If those operations give a significant performance increase over the naive approach and the resulting files are readable by Archive Utility on Mac, I'll be curious to see how they did it. Maybe there's a trick that I'm not aware of that makes it easier than I think.