mafintosh / tar-stream

tar-stream is a streaming tar parser and generator.
MIT License
400 stars 93 forks source link

Incorrect handling of backpressure #119

Closed arantes555 closed 4 years ago

arantes555 commented 4 years ago

Hello !

After updating to 2.1.2 I have noticed that my process dies without finishing. What happens is that the node process just ends, as if there was nothing to do anymore. This behaviour does not happen with version <= 2.1.1. Also, it only seems to happen for large files. I have seen this happen on node 12 and node 13.

After looking at the difference between 2.1.1 and 2.1.2, it seems to me that it is that the _drain is never called.

My usage of the library is of the form:

  const toPromise = (func) => new Promise((resolve, reject) => func((err, res) => err ? reject(err) : resolve(res)))

  const taredStream = tar.pack()
  const stat = await promisify(fs.lstat)(options.input)
  const tarHeader = {
    name: path.basename(options.input),
    mode: stat.mode,
    mtime: stat.mtime,
    size: stat.size,
    uid: stat.uid,
    gid: stat.gid,
    type: 'file'
  }
  console.log('adding tar entry', options.input)
  await toPromise(handler => taredStream.entry(tarHeader, jetpack.read(options.input, 'buffer'), handler))
  console.log('entry added! Finalizing tar...')
  taredStream.finalize()
  console.log('Tar finalized. Writing it to disk: ', outputPath)

After being finalized, the tarStream is then sinked with an .on('data') listener.

What happens more precisely is that, for large files, the toPromise call never finishes.

tex0l commented 4 years ago

I confirm I can reproduce this bug, here is the full repro case:

const { promisify } = require('util')
const fs = require('fs')
const tar = require('tar-stream')
const path = require('path')

const jetpack = require('fs-jetpack')

const toPromise = (func) => new Promise((resolve, reject) => func((err, res) => err ? reject(err) : resolve(res)))

const main = async () => {
  const input = './testFile'
  await jetpack.writeAsync(input, Buffer.alloc(1024 * 1024, '1'))
  const taredStream = tar.pack()
  const stat = await promisify(fs.lstat)(input)
  const tarHeader = {
    name: path.basename(input),
    mode: stat.mode,
    mtime: stat.mtime,
    size: stat.size,
    uid: stat.uid,
    gid: stat.gid,
    type: 'file'
  }
  const buffer = jetpack.read(input, 'buffer')
  console.log('adding tar entry', input)
  await toPromise(handler => taredStream.entry(tarHeader, buffer, handler))
  console.log('entry added! Finalizing tar...')
  taredStream.finalize()
  console.log('tar finalized')
  await new Promise((resolve, reject) => {
    let temp = ''
    taredStream
      .on('data', chunk => {
        temp += chunk.toString('binary')
      })
      .on('error', reject)
      .on('end', () => resolve(temp))
  })
  console.log('finished')
}

main()

with

{
  "name": "test",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "node index.js"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "fs-jetpack": "2.4.0",
    "tar-stream": "2.1.2"
  }
}

I get the following log (I should see "entry added! Finalizing tar..."):

➜  test node repro.js
adding tar entry ./testFile
➜  test node --version
v12.16.3

As @arantes555 said, the bug comes from https://github.com/mafintosh/tar-stream/blob/master/pack.js#L134 which defines _drain as being the callback, but never calls it.

mafintosh commented 4 years ago

Both of these examples doesn't drain the pack stream. If you don't drain the stream, the callback won't fire.

You are using an await to "block" the execution before you come down to your ondata drain in the bottom. You need to move the drain up to the beginning of the program.

mafintosh commented 4 years ago

Same goes for the first example, you need to setup draining of the stream before awaiting an .entry call.

arantes555 commented 4 years ago

@mafintosh Alright, thanks for the explanation.

However, I think it should be made very clear in the documentation. I don't think it's explicitly said anywhere (I may be wrong).

Also, this seems to be a major breaking change of behaviour, which would have deserved more than a patch release in my opinion.

mafintosh commented 4 years ago

PR welcome.

Backpressure was broken before when appending single buffers (ie you'd run out of memory). This is normal stream behaviour.

mafintosh commented 4 years ago

I do agree that the docs are a bit sparse here (from my early node days).

The callback is a drain callback. You don't want to use the callback if you don't want to implement backpressure.