mafintosh / tar-stream

tar-stream is a streaming tar parser and generator.
MIT License
400 stars 93 forks source link

Risk of unexpected exit if you forget to call .finalize, and how to eliminate this danger #128

Closed jedwards1211 closed 1 year ago

jedwards1211 commented 3 years ago

I had a brain-melting debugging experience yesterday because I forgot to call .finalize() on an Archiver.js zip stream, and tar-stream appears to work the same way, so I wanted to bring this to your attention: https://github.com/archiverjs/node-archiver/issues/457

I assume you and the author of Archiver.js don't realize how confusing the behavior can be in this case, because otherwise I'd expect to see a troubleshooting section in the README about this. It took me at least an hour to figure out why the process was exiting for seemingly no reason.

Without .finalize(), the stream reaches a point where it can't put any new events on the event loop, and the event loop becomes empty, causing Node to exit with code 0 and no error output (unless some unrelated async process is keeping the event loop going, in which case the stream hangs forever). This can happen in the middle of awaiting a promise that's consuming the stream (in my case an AWS S3 upload promise), which is extremely confusing because it's easy to forget that the process can just die in the middle of an async function. It would help everyone if we could prevent this from happening in the first place.

Here's an API that eliminates this danger by allowing the user to build the archive in an async function, and automatically finalizing when the returned promise resolves:

const tar = require('tar-stream')
const pack = tar.pack(async pack => {
  // pack.entry returns a promise to handle backpressure
  // add a file called my-test.txt with the content "Hello World!"
  await pack.entry({ name: 'my-test.txt' }, 'Hello World!')
  // add a file called my-stream-test.txt from a stream
  await pack.entry({ name: 'my-stream-test.txt', size: 11 }, async entry => {
    entry.write('hello')
    entry.write(' ')
    entry.write('world')
    // entry is automatically ended when returned promise resolves
  })
  // pack is automatically finalized when returned promise resolves
})
philios33 commented 2 years ago

I also ran in to this.

It is very important to make sure you pipe the pack somewhere BEFORE adding streamed entries of more than 1 read chunk in size so that the streams actually flow somewhere. I tried to create an entry before piping the pack and it never drains the entry stream. If you then have code that waits for the streaming to finish (which you need so you can do a finalize call at the right time), this will hang forever.

If you don't pipe anywhere, node may exit in a strange (no error message) way if there is nothing more to do.

mafintosh commented 1 year ago

thats how streams work for better and worse

jedwards1211 commented 1 year ago

@mafintosh if you want to see an example of how it actually doesn't have to be that way, you could give the API suggestion I wrote above a look...