sindresorhus / nano-spawn

Tiny process execution for humans — a better child_process
MIT License
86 stars 3 forks source link

Output iteration + buffering #33

Closed ehmicky closed 1 week ago

ehmicky commented 3 weeks ago

We buffer and return the subprocess stdout/stderr as result.stdout and result.stderr.

Users can also iterate over stdout/stderr. It is helpful for getting progressive output (e.g. a progress bar) or transforming it incrementally.

Also, one of the main use cases of iteration is to reduce memory consumption. However, in our case, the memory is unfortunately not reduced since we buffer result.stdout/result.stderr anyway.

The problem is: with the current API, we only know whether the subprocess output will be iterated or not after the subprocess has already started and its output is already buffering. We have to start buffering right away, else we might miss some data.

Some possible solutions:

  1. When iteration starts, stop any ongoing buffering, and return undefined for result.stdout/result.stderr.
  2. Add a boolean option.
  3. Remove the iteration feature altogether.
  4. Expose iteration as a separate top-level method.

I am leaning towards 1. because it results in the simplest API, while still keeping the iteration feature. It's somewhat quirky though.

What do you think?

sindresorhus commented 2 weeks ago

When iteration starts, stop any ongoing buffering, and return undefined for result.stdout/result.stderr.

👍 But maybe keep them empty strings? The main use-case is not iteration, and it would be annoying having to guard them for each usage when they are almost always filled in.

ehmicky commented 2 weeks ago

Yes, you're right. Also, making the type change from string to undefined depending on whether buffer.stdout is iterated is not typable in TypeScript, so it's better to turn it into an empty string. :+1:

ehmicky commented 2 weeks ago

I've been trying to implement this but this is more difficult than expected.

The problem is being able to first remove the data listener, then start the for await (const chunk of stream) loop, while being 100% sure no chunk will be dropped in-between. I've been digging through the Node.js stream codebase, and it's a little tricky. The problem is that chunks are internally buffered by Node.js and some of the data/readable event emission is done asynchronously (typically, after 1 or multiple ticks). So it's not straightforward to switch from one consumer to another without dropping any chunk.

I am trying to think of implementation solutions right now. :thinking:

One possible solution is outlined at https://github.com/sindresorhus/nano-spawn/issues/43