thejoshwolfe / yazl

yet another zip library for node
MIT License
329 stars 44 forks source link

Piping streams which come from third party servers #74

Open fkocherovsky opened 2 years ago

fkocherovsky commented 2 years ago

I use addReadStream API to create a ZIP file based on multiple streams. Works as expected when streams are locally created streams. But there is an issue when streams come from third party server and yazl even doesn't detect the issue. Current implementation takes 'first not done entry', pipes it, and on "end" event takes the next not done entry and pipes it ... When there are many entries to pipe, might pass significant time interval when some next entry/stream is being started handled. Significant means in this case that the third party server decides that some stream is to long idle, and as a result the stream is just closed by the server. And as a result Zip isn't created. that such stream is already aborted. It's what happens when we try to create Zip by yazl based on streams from our S3 server. The solution for such situation is new API which doesn't takes streams like addReadStream does, but takes a function which creates stream just before to start piping this specific stream/file. Something like the following:

ZipFile.prototype.addStreamCreator = function(creator, metadataPath, options) {
   var self = this;
   metadataPath = validateMetadataPath(metadataPath, false);
   if (options == null) options = {};
   var entry = new Entry(metadataPath, false, options);
   self.entries.push(entry);
   entry.setFileDataPumpFunction(async function() {
      creator(metadataPath).then((stream) => {
         entry.state = Entry.FILE_DATA_IN_PROGRESS;
         console.log(`Starting to pump ${metadataPath}`);
         pumpFileDataReadStream(self, entry, stream);
         //pumpEntries(self);
      });
   });
 };

BTW, this is already working and tested function. Thanks

overlookmotel commented 1 year ago

I can see 2 ways to solve this with yazl's existing interface by implementing some form of "laziness":

1. Lazily add entries

Only call zipfile.addReadStream for each entry once the previous entry is consumed (listen for close event on the previous read stream to know when this has happened).

2. Stream wrapper

Create your own ReadStream wrapper. The wrapper would lazily create the real 3rd party read stream only when the wrapper is first read from, and then pipe the data through. You'd pass the wrapper to yazl.

Hope this helps.

fkocherovsky commented 1 year ago

We already use yazl with our own code customization in our Prod servers. Actually it's new exported function. The modifications we did are described in the following pull request: https://github.com/thejoshwolfe/yazl/pull/75 Do you see this code can be officially added to yazl?

overlookmotel commented 1 year ago

As far as I can see, this package is no longer maintained. Last commit was Dec 2018. Which is a shame, because it's a brilliant piece of work! But hey... obviously we all understand the pressures on open source maintainers.

But I can't imagine there's much chance your PR will ever be merged, hence why I suggested some other ways to solve the problem.