nodejs / tooling

Advancing Node.js as a framework for writing great tools
171 stars 15 forks source link

stdin.read interface #68

Closed ruyadorno closed 2 years ago

ruyadorno commented 4 years ago

I'm pretty sure I heard someone in one of the meetings in the past mention it before but I couldn't find any issue so I'm filling this one so that we can track/discuss it.

Description

A different interface for reading from standard input, making it simpler to create scripts that just need to consume a blob of data from stdin at once.

Desired behaviors

Userland alternatives

/cc @Ethan-Arrowood

ruyadorno commented 4 years ago

I'm really not sure how feasible it is and what an useful API would look like but I just want to make sure we have a place to follow up with the discussion, feel free to contribute more userland alternatives, desired outcomes and correct me if anything sounds wrong in the original description.

Ethan-Arrowood commented 4 years ago

For reference, I was creating a script that inflates the output of a git object on the command line. The idea was I could pipe the output of cat .git/objects/<hash> to node -e 'zlib.inflateSync(process.stdin.read()).pipe(process.stdout)' for a quick inflate script as I test some code i'm writing.

The solution for my original problem was utilizing zlib.createInflate()

node -e 'process.stdin.pipe(zlib.createInflate()).pipe(process.stdout)'
Ethan-Arrowood commented 4 years ago

This isn't a pressing feature for me either, but it is an interesting use case and benefit to the userland. So I'm happy to champion if we think it is a useful addition

coreyfarrell commented 4 years ago

Could Buffer.fromStream(stream, (error, buffer) => {}) be a reasonable API? This could be promisified to support const buffer = await fromStream(stream);.

Additional possibly related prior art: https://github.com/maxogden/concat-stream/ (lacks an isTTY check).

sam-github commented 4 years ago

Isn't the pipe solution both more succinct, and will not load the entire git object into memory (exploding if its larger than available memory)?

Note that reading a stream of unknown length (stdin, for example) into a memory buffer involves double copying (at least) all the data, as well as potentially unconstrained memory allocation (streams don't have to have an end, cat /dev/zero | node -e '....').

Why would concat-stream need a isTTY check?

coreyfarrell commented 4 years ago

I agree the pipe solution is probably best for what @Ethan-Arrowood was doing and reading an entire stream into memory is inefficient / has risks. I have run into situations where "just get me the complete data" has been very useful (admittedly always in tests).

I don't think concat-stream needs a isTTY check, I was just commenting for the purpose of prior art you would have to guard against process.stdin.isTTY to use it as a way to get the entire stdin buffer.

After reading @sam-github's comment and my own realization that I've never used this sort of functionality in production I have to ask is this something that even belongs in the node.js core?

Ethan-Arrowood commented 4 years ago

After reading @sam-github's comment and my own realization that I've never used this sort of functionality in production I have to ask is this something that even belongs in the node.js core?

That is a great point @coreyfarrell I doubt I'd use the script for anything other than debugging/testing in my terminal.

ruyadorno commented 4 years ago

now that TLA landed on master, combined with the async/await stream support we have a more reasonable interface for consuming process.stdin in such eval oneliners:

-cat README.md | node -e "inp=''; process.stdin.on('data',c=>inp+=c).on('end',()=>process.stdout.write(inp.replace(/[a-z]/g, 'x')))"
+cat README.md | node -e "for await (const c of process.stdin) process.stdout.write(c.toString().replace(/[a-z]/g, 'x'))"

Note: Requires --experimental-top-level-await and --input-type=module flags

ruyadorno commented 3 years ago

I'm pretty sure I heard someone in one of the meetings in the past mention it before but I couldn't find any issue so I'm filling this one so that we can track/discuss it.

oh, it sounds like that person may be @bengl (ref: https://twitter.com/bengl/status/1402014875870765057)

bengl commented 3 years ago

@ruyadorno

It looks like the future looks something like this:

cat README.md | node -e "process.stdout.write(Buffer.concat(await process.stdin.toArray()).toString().replace(/[a-z]/g, 'x'))"

It would be nice to get rid of the Buffer.concat with some version of flat() that works interoperably between Arrays and TypedArrays.

Alternatively, some kind of async .concat() or .consume() on buffer streams would be pretty cool. On non-buffer streams it could give an array of all the objects.

If there's a worry that the promise would never resolve if the stream doesn't end: Consider that we already have this effect synchronously with [...iteratorThatNeverEnds].

ruyadorno commented 2 years ago

oh thanks @bengl ! following your PR I learned about the Stream consumers API, that might be what I was looking for when I first opened this issue originally:

const { text } = require('stream/consumers')
await text(process.stdin)