regression: data truncation with stream.writev in IPC channel

gireeshpunathil commented 5 years ago

Version: all supported versions
Platform: all UNIX variants
Subsystem: stream, child_process

I was working on a potential solution for the data truncation in child standard streams upon process exit problem, and came across this scenario:

const cp = require('child_process')
const KB = 1024
const MB = KB * KB
if (process.argv[2] === 'child') {
  const data = Buffer.alloc(KB).fill('x').toString()
  for (var i = 0; i < MB; i++)
    console.log(data)
  // process.exit(0)
} else {
  const child = cp.spawn(process.execPath, [__filename, 'child'])
  var count = 0
  child.stdout.on('data', c => count += c.length)
  child.stdout.on('end', () =>  console.log(count))
  child.stderr.on('data', (d) =>  console.log(d.toString()))
}

create a 1KB buffer in a spawned child.
write it to the parent 1MB times.
parent should get 1074790400 bytes (1GB +1MB new line chars that console.log adds)

actual data obtained in UNIX is much less, and is arbitrary in each run. Windows works fine.

The behavior is well understood in the presence of process.exit but in the absence of it, the process is supposed to drain all the data while there are active requests in the event loop. That is not happening here.

Tracing back, I landed at d85d1205932d583f72eefc849a3e655e26e5e305 which introduced vectored writes in pipe streams. With this in place, when the channel experiences congestion, it throws UV_ENOBUFS which is passed to the calling code unhandled:

Error: write ENOBUFS
    at exports._errnoException (util.js:1026:11)
    at Socket._writeGeneric (net.js:715:26)
    at Socket._writev (net.js:729:8)
    at doWrite (_stream_writable.js:324:12)
    at clearBuffer (_stream_writable.js:413:5)
    at onwrite (_stream_writable.js:365:7)
    at WriteWrap.afterWrite [as oncomplete] (net.js:816:12)

Further, commit 1d6b729cea909769ac0bcb3aa68f5fe567c4ffb7 inadvertently caused this error from not being visible.

Subsequently there are lot of changes in stream_base.cc and stream_wrap.cc around DoTryWrite and Writev which makes it difficult to pin-point any particular point where this can be rectified.

Looking at Writev implementation, it does not handle when the data size is more than INT_MAX. In my opinion this should be fixed, as vectored write is an internal mechanism that should hide itself from the JS API. Also, looks like it does not leverage libuv's write-when-you-can capability (uv__write)?

/cc @nodejs/streams @nodejs/child_process

Happy to do any further assistance / testing.

addaleax commented 5 years ago

Do you know the root cause of ENOBUFS? The error code seems to come from libuv, not Node.js or the OS.

gireeshpunathil commented 5 years ago

@addaleax - libuv does not throw ENOBUFS for writes AFAIK (as it already has the write buffers with it), only for reads.

this is where it is thrown: https://github.com/nodejs/node/blob/a1b283c2cacccb2e6d6f3a2382d986cd4ead11b4/src/stream_base.cc#L106-L107

Here is the sequence of events that led to this scenario IIUC.

initial chunks are dispatched and written individually, with no issues.
at some point, the channel experiences congestion.
libuv returns EAGAIN (resources temporarily unavailable).
this causes the stream to accumulate further requests within itself.
at some point (probably when one of the POLLOUT watcher states readiness) the stream attempts to write, basically drain all the buffers - by this time there are a lot.
node::StreamBase::Writev manages this, and sees that the total data in all the chunks are more than INT_MAX and it fails.

IMO, either

Writev should manage the complexity by writing as much as it can and return the rest of data back to stream at JS side or
JS stream should send only chunks that Writev can manage and buffer the rest from within.

gireeshpunathil commented 5 years ago

ok, I am able to confirm above theory - If I comment the fast path block in stream_writable

https://github.com/nodejs/node/blob/a3801e96835821b71dd97b3f8513c56814bcf8fa/lib/_stream_writable.js#L504-L519

then I get the full data.

So the inference and fix path remain in tact. Ability to manage chunks whose cumulative byte count exceeds INT_MAX - either in JS side (stream.writev), or in C++ side (StreamBase::Writev), is what we are looking for.

mcollina commented 5 years ago

Good analysis @gireeshpunathil! I stumbled upon some bad EAGAIN behavior recently in a userland module that I maintain (https://github.com/mcollina/sonic-boom/pull/17). I know these scenarios are really hard to test and fix and I dig into similar variations of this problem.

this causes the stream to accumulate further requests within itself.

Backpressure/flow control should be kicking in at this point and well-written applications should limit memory usage. Isn't it? (There are a lot of cases where it's not possible to apply backpressure correctly, such as with logging, console.log and ttys).

IMO, either

Writev should manage the complexity by writing as much as it can and return the rest of data back to stream at JS side or

JS stream should send only chunks that Writev can manage and buffer the rest from within.

We are not in agreement. I think this is a problem related to how console.log operates and the constraints we have for stdio. In fact, this is only one example of possible data corruption that can happen in this scenario, i.e. if the user calls process.exit(0) while some data is still buffered in the stream, that data will be lost. In order to avoid this data loss, we go to long length to ensure stdout and stderr is synchronous when writing to TTYs and files (https://github.com/nodejs/node/blob/master/lib/internal/process/stdio.js#L168-L179). For example, if stdout is a file and we cannot write fast enough, we block the whole process to avoid missing data - and our users expects that. Unfortunately, we are not doing the same when stdout and stderr are pipes.

I propose:

for all of fs and net, we keep the current behavior of erroring the stream when the mentioned condition happens. This is the safest and more performant behavior.
for stdout and stderr, we should have the main thread wait/stall until the data cannot be flushed. Basically there is a misbehaving user (console.log) that does not apply backpressure, and the only solution is to apply backpressure to the whole process.

addaleax commented 5 years ago

@gireeshpunathil Thanks, this makes sense!

I think I agree with you about our options, but just to make sure we’re in agreement: We would either

transparently modify the native Writev to keep an extra queue for data exceeding INT_MAX, or
introduce a maximum total data size to _writev in streams (e.g. 1 GB)

Right?

@mcollina I think you’re talking about a different kind of issue; this isn’t really related to stdio specifically, or premature exits?

this causes the stream to accumulate further requests within itself.

Backpressure/flow control should be kicking in at this point and well-written applications should limit memory usage. Isn't it?

@mcollina The problem is that, when we provide _writev, it will try to write all buffered data in a single batch, even if that batch is huge (and in this case exceeds our own storage limits).

for stdout and stderr, we should have the main thread wait/stall until the data cannot be flushed.

@mcollina I have https://github.com/libuv/libuv/issues/390 on my backlog … once that is implemented, I think a clean solution would be to move stdio streams to a new, separate loop when Node exits, and to spin that loop until it exits just for stdio?

mcollina commented 5 years ago

transparently modify the native Writev to keep an extra queue for data exceeding INT_MAX, or

I'm -1 to this approach, as we will keep accepting data from a misbehaving source. The source should slow down otherwise memory usage will grow without control. At some point, the process will crash for out of memory.

Introduce a maximum total data size to _writev in streams (e.g. 1 GB)

Then we will still error/lose data, just at a different point. It will not solve the issue. I'm -1 to this solution as well.

@mcollina I think you’re talking about a different kind of issue; this isn’t really related to stdio specifically, or premature exits?

I think it is, check the example - it's using console.log. The behavior this issue describes can only happen with a client that writes too much data without checking for backpressure. If this happens for any stream, the solution is to properly implement backpressure in the producer, not to keep increasing our memory footprint. Why would console.log be any different?

gireeshpunathil commented 5 years ago

@addaleax -

transparently modify the native Writev to keep an extra queue for data exceeding INT_MAX

agree. We just need to invoke libuv writers multiple times instead of once, slicing it appropriately. The POLLOUT watchers that libuv inserts take care of draining those at appropriate times? If the data grows indefinitely and leads to OOM at that stage, that is fine.

introduce a maximum total data size to _writev in streams (e.g. 1 GB)

agree, but dispatch the rest of data in a subsequent call(s)? In other words, call native Writev only with how much data it can handle.

I have libuv/libuv#390 on my backlog

An approach that I was trying is here: https://github.com/gireeshpunathil/libuv/commit/e6e370264bac58609a87c3625f5abb8dcb420185

@mcollina -

If we use a buffer instead of string, this truncation does not happen. With buffers, native Writev just transports as much data it was sent with. Only when the chunks contain strings it fails. The failure reason IMO is that it cannot compose a single v8::String of that size from coalescing the fragment strings. But from user's point of view, they did not mean to send those as a single string, so it is not their problem?
backpressure makes relevance only if we are looking at real OOM situations, which is not the case here? for example, the data in question is already stored in the JS heap without any issues, and problem is with the transport logic; and hence I called it as an internal complexity.
and, this worked without any issue prior to commit d85d1205932d583f72eefc849a3e655e26e5e305 so rate limiting / managing at userland does not apply?
and finally, process.exit scenario is different, though that was my starting point for this finding. Also this is not specific to console.log or standard streams, any stream that employs writev function are affected.

mcollina commented 5 years ago

If we use a buffer instead of string, this truncation does not happen. With buffers, native Writev just transports as much data it was sent with. Only when the chunks contain strings it fails. The failure reason IMO is that it cannot compose a single v8::String of that size from coalescing the fragment strings. But from user's point of view, they did not mean to send those as a single string, so it is not their problem?

It is their problem. They are creating an application that is writing a huge amount of data to standard output without any backpressure controls. The root cause of the problem is that console.log is completely ignoring backpressure in https://github.com/nodejs/node/blob/master/lib/internal/console/constructor.js#L224: Writable is pushing back saying that it's over its highwatermark, but console is not doing anything with it, and it keeps writing without a chance for the stream to pick up and write that data.

backpressure makes relevance only if we are looking at real OOM situations, which is not the case here? for example, the data in question is already stored in the JS heap without any issues, and problem is with the transport logic; and hence I called it as an internal complexity.

Actually, it will end up in a OOM situation if you increase enough the amount of data that is being sent. Moreover, this will cause a huge memory usage spike that the user would not predict for. If we are not slowing the producer down, the process is going to crash anyway.

and, this worked without any issue prior to commit d85d120 so rate limiting / managing at userland does not apply?

I think d85d120 just exposed an issue that has always been there. The buffering issue with pipes and console.log was just happening on the JS level.

and finally, process.exit scenario is different, though that was my starting point for this finding. Also this is not specific to console.log or standard streams, any stream that employs writev function are affected.

My point is that for streams, this is totally normal and expected. For console.log, it's not. I think to fix this class of bugs, we must solve the discrepancy between console.log when using files or TTYs and when there is a pipe. Somehow, if stream.write() return false in https://github.com/nodejs/node/blob/master/lib/internal/console/constructor.js#L224 we should slow down the producer, i.e. the main JS thread. I think for console.log usage we really need a writeSync operation.

gireeshpunathil commented 5 years ago

ok, here is a version that does not use console: $ cat noipc.js

const n = require('net')
const KB = 1024
const MB = KB * KB
n.createServer((m) => {
  var count = 0
  setTimeout(() => {
    m.on('data', c => count += c.length)
    m.on('end', () =>  console.log(count))
  }, 2000)
}).listen(32000, () => {
  const p = n.connect(32000, () => {
  const data = Buffer.alloc(KB).fill('x').toString()
  for (var i = 0; i < MB; i++)
    p.write(data)
    p.end()
  })
})

$ node noipc

Error: write ENOBUFS
    at afterWriteDispatched (internal/stream_base_commons.js:125:25)
    at writevGeneric (internal/stream_base_commons.js:111:3)
    at Socket._writeGeneric (net.js:720:5)
    at Socket._writev (net.js:729:8)
    at doWrite (_stream_writable.js:413:12)
    at clearBuffer (_stream_writable.js:522:5)
    at onwrite (_stream_writable.js:470:7)
    at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:75:10)
Emitted 'error' event at:
    at errorOrDestroy (internal/streams/destroy.js:98:12)
    at onwriteError (_stream_writable.js:430:5)
    at onwrite (_stream_writable.js:461:5)
    at _destroy (internal/streams/destroy.js:40:7)
    at Socket._destroy (net.js:619:3)
    at Socket.destroy (internal/streams/destroy.js:32:8)
    at afterWriteDispatched (internal/stream_base_commons.js:125:17)
    at writevGeneric (internal/stream_base_commons.js:111:3)
    at Socket._writeGeneric (net.js:720:5)
    at Socket._writev (net.js:729:8)

mcollina commented 5 years ago

I do not see any issue at all with that example (https://github.com/nodejs/node/issues/24992#issuecomment-447004050), and I do not think that is a bug. An application that manipulates streams without handling backpressure is going to have major issues in any case, and this is just one of many.

I consider the data truncation with console.log a bug.

On a side note, I did solve this issue in pino https://github.com/pinojs/pino/pull/566 by using fs.writeSync(fd, str).

gireeshpunathil commented 5 years ago

@mcollina - so who / what defines a backpressure limit? is it a string whose un-tersed byte length is INT_MAX? and who controls it? If I increase --max-old-space-size or any other tunables for that matter, I am unable to change this behavior?

From the VM perspective, it should define how much data it can handle under which abstractions, and the limitations should be documented under the corresponding APIs. Here, _writev was not a user's choice, so any limitations affects the stream in general.

From an end-user perspective, inability to transport a 1GB data (that was accumulated over time due reasons outside of program's control) looks like a bug to me.

And, we can actually fix it! I have come up with something that works: #25044. But given the sensitivity of the code area, would love more feedback and view points. thanks!

mcollina commented 5 years ago

I think that an end-user trying to append 1GB of data to a Stream synchronously has a bug in their code. We are not in agreement if Node.js should keep buffering until it crashes for OOM or it should emit a proper error if a stream do not use backpressure. I do not think this disagreement is solvable with further discussion.

gireeshpunathil commented 5 years ago

closing based on @mcollina 's feedback.

I have experimented with a 'workround' in JS side as opposed to the C++ side as attempted in #25044 , that detects the ENOBUF situation pro-actively and falls back to slow path (which is always capable to handling any amount of data chunks). Just keeping it here for future reference.

diff --git a/lib/_stream_writable.js b/lib/_stream_writable.js
index 570c4c4..035cf25 100644
--- a/lib/_stream_writable.js
+++ b/lib/_stream_writable.js
@@ -28,6 +28,7 @@
 module.exports = Writable;
 Writable.WritableState = WritableState;

+const MAX_CHAR = 536870911; // INT_MAX / 4
 const internalUtil = require('internal/util');
 const Stream = require('stream');
 const { Buffer } = require('buffer');
@@ -500,8 +501,17 @@ function onwriteDrain(stream, state) {
 function clearBuffer(stream, state) {
   state.bufferProcessing = true;
   var entry = state.bufferedRequest;
+  var count = 0;
+  var fast = true;
+  while (entry) {
+    var chunk = entry.chunk;
+    count += state.objectMode ? 1 : chunk.length;
+    if (count >= MAX_CHAR) { fast = false; break;}
+    entry = entry.next;
+  }
+  entry = state.bufferedRequest;

-  if (stream._writev && entry && entry.next) {
+  if (fast && stream._writev && entry && entry.next) {
     // Fast case, write everything using _writev()
     var l = state.bufferedRequestCount;
     var buffer = new Array(l);

mcollina commented 5 years ago

My feedback is related to writing down to a stream. However, I think the console.log behavior is a bug when a pipe is used.

mcollina commented 5 years ago

Here is my take to solve this issue: https://github.com/nodejs/node/pull/25638.

jasnell commented 4 years ago

ping @mcollina and @gireeshpunathil ... any further thoughts on what we should do here?

gireeshpunathil commented 4 years ago

My assessment is that it cannot be solved without side effects. So may be, close and re-visit if this becomes problematic for real world scenarios?

nodejs / node

regression: data truncation with stream.writev in IPC channel #24992