nodejs / readable-stream

Node-core streams for userland
https://nodejs.org/api/stream.html
Other
1.03k stars 227 forks source link

OOM in test/parallel/test-stream-pipeline #353

Closed vweevers closed 6 years ago

vweevers commented 6 years ago

On Windows 10 with node 6.8.0 x64, multiple tests in test/parallel/test-stream-pipeline.js are failing with OOM errors.

> node test\parallel\test-stream-pipeline.js
TAP version 13
ok 1 - sync run
[..]
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

I suspect the failures are related to one another, but I do not fully understand what is happening, so let's start by diving into one of them:

https://github.com/nodejs/readable-stream/blob/9004c8120f9fea9aa5b4058c5bd220db95c4ed43/test/parallel/test-stream-pipeline.js#L183-L212

I think this test goes into an endless loop. The use of setImmediate to destroy the HTTP response, means that read on L186 will be continuously called, tick after tick. But if that's true, why doesn't it happen on other versions/platforms too?

@mcollina any idea?


Because pipeline was introduced in node 10 and the error only occurs on node 6 (I did not yet test versions between 6 and 10), I reckoned readable-stream is the proper repo for this issue - correct me if I'm wrong.

mcollina commented 6 years ago

Can you please verify that this is the case with the latest version of Node 6 on Windows? 6.8.0 is pretty old, and it might be a Windows-specific bug.

This is either a) a Windows-bug in Node 6 (which we probably won’t fix) b) a bug in the stream machinery.

Anyway, the test should not go OOM in any case. Can you send a PR to Node core that limits the numbers of pushes to 42 (random number)? Don’t push null, as the test verifies that destroy must be called.

vweevers commented 6 years ago

Can you please verify that this is the case with the latest version of Node 6 on Windows? 6.8.0 is pretty old, and it might be a Windows-specific bug.

Also happens on node 6.14.3.

This is either a) a Windows-bug in Node 6 (which we probably won’t fix) b) a bug in the stream machinery.

How can we find out? If we limit the number of pushes to 42, won't that hide a potential bug?

Don’t push null, as the test verifies that destroy must be called.

Roger. Side note: the lack of assertions, assertion messages or comments makes it hard to tell what the purpose of these tests is - which makes me hesitant to contribute.

mcollina commented 6 years ago

cc @mafintosh (added the test)

IMHO that logic in that test is faulty in Node < 10, where .push call read synchronously.

mafintosh commented 6 years ago

I'll take a look. You're probably right.

vweevers commented 6 years ago

FWIW the OOM also happens in:

vweevers commented 6 years ago

Related? This throws on node 10 (and readable-stream):

const stream = require('stream')

stream.pipeline(
  new stream.Readable({
    read (size) {
      if (this.destroyed) throw new Error('late read')
      this.push('foo')
    }
  }),
  new stream.Writable({
    write (chunk, enc, next) {
      next(new Error('stop'))
    }
  }),
  (err) => {}
)
mcollina commented 6 years ago

@vweevers what does that throw?

vweevers commented 6 years ago

Error: late read

vweevers commented 6 years ago

(Intuitively, I do not expect _read to be called after destroy, but it is)

mcollina commented 6 years ago

I think we are missing a check somewhere about this, it should not happen.

mcollina commented 6 years ago

The following test passes:

'use strict';

const common = require('../common');
const assert = require('assert');
const { Readable } = require('stream');

const rs = new Readable({
  read: common.mustCall(function() {
    this.push('hello');
  }, 3)
});

rs.on('data', () => {
  rs.destroy();
});

It is normal that multiple _read call happens after receiving the first element. The reason for this sits in the backpressure machinery. As you can see, it happens 3 times on Node 10.

The problem you are seeing on Windows is the result of some differences in libuv between Unix and Windows, and some bad interactions between the tests: the event loop is always filled with tasks to execute and it keep accepting things. I'm sending a PR against core now to fix the test.

mcollina commented 6 years ago

@vweevers https://github.com/nodejs/node/pull/22456.

mcollina commented 6 years ago

This should be fixed in the master and 3.0.3. Please let me know!

vweevers commented 6 years ago

@mcollina Confirmed, tests are passing on Windows. Thanks! 🎉