Closed RomainMuller closed 3 months ago
The stack trace of the error points at the following code:
https://github.com/nodejs/node/blob/c1da528bc25c9cc5a8240a7b4f136f5968f6e113/lib/fs.js#L590-L592
I basically expect the windows binding
returns different codes for Socket
than for ReadStream
and in turn causes the difference in behavior.
You will get much better results with:
/// script.js
const fs = require('fs');
console.error('Node: ' + process.execPath);
console.error('Version: ' + process.version);
console.error('Platform: ' + process.platform);
console.error('process.stdin: ' + process.stdin.constructor.name);
const buffer = Buffer.alloc(4_096);
let read;
do {
read = fs.readFileSync('/dev/stdin', buffer, 0, buffer.length, null);
if (read != 0) {
console.log('STDIN: ' + buffer.slice(0, read).toString('utf-8'));
}
} while (read != 0);
console.log('EOF was (normally) reached.');
Don't fiddle with the 0, 1 and 2 file descriptors. They are open in non-blocking mode and libuv doesn't expect this when you are passing a file descriptor.
Hey @mmomtchev,
read = fs.readFileSync('/dev/stdin', buffer, 0, buffer.length, null);
This is interesting and all... But /dev/stdin
is not a thing on Windows, and I cannot require WSL... 🤷🏻♂️
Or does fs
have some magic tricks to make it look like this is a file that exists?
No, you are right, I can't think of any easy way to synchronously read from stdin on Windows
There are npm modules or you could go the async way
There is an awful solution here http://corpus.hubwiz.com/2/node.js/3430939.html that will simulate blocking access, repeating the call on EAGAIN
and breaking the loop on EOF
But the real answer is that Node is built for async I/O
We already have the ugly code to loop on EAGAIN
...
Unfortunately, the use-case I have does not lend itself to async
exchange here... As there are ordering guarantees I must maintain.
The real thing I'm concerned with here, is how fs.readSync
throws an EOF error... When it is documented to never do this and instead return 0
.
It says that the descriptor you are passing must be opened in blocking mode.
It says that the descriptor you are passing must be opened in blocking mode.
Where though?
I'm looking at: https://nodejs.org/api/fs.html#fs_fs_readsync_fd_buffer_offset_length_position, and the only place on this page where a requirement for fd
to be in blocking (or non-blocking for that matters) mode is on createReadStream
and createWriteStream
, none of which I am using...
In any case, in light of your confirming that we are doing things likely to trip libuv
in weird and awkward ways, I've been implementing a more comprehensive (dirty) workaround in aws/jsii#2238 that also catches the EOF
error; which would buy us time to figure out if/how we can proceed without needing to do synchronous I/O on our STD{IN,OUT,ERR}
...
@RomainMuller in fact, the stdin is not opened in non-blocking mode on Windows, that is the case only on Linux and OSX, so maybe there is a solution to your problem
On Windows libuv does not properly handle pipes when operating in file mode. I wonder if it is supposed to, because normally, there is a separate pipe mode, but it is a trivial change. In any case, this issue should be taken to https://github.com/libuv/libuv It happens because the writing process exits and closes the anonymous pipe. Feel free to open it there, referencing this issue.
This is everything that is needed to make it work:
--- a/deps/uv/src/win/fs.c
+++ b/deps/uv/src/win/fs.c
@@ -918,7 +918,7 @@ void fs__read(uv_fs_t* req) {
SET_REQ_RESULT(req, bytes);
} else {
error = GetLastError();
- if (error == ERROR_HANDLE_EOF) {
+ if (error == ERROR_HANDLE_EOF || error == ERROR_BROKEN_PIPE) {
SET_REQ_RESULT(req, bytes);
} else {
SET_REQ_WIN32_ERROR(req, error);
But then again, you will end up with Windows-specific JS code, which is not the Node way
@ronag is this supposed to work?
Sorry, sync api's is not my area of expertise.
It is not specific to synchronous mode. This is getting stranger by the minute.
On Linux this leaves stdin
in non-blocking mode:
const fs = require('fs');
console.error('process.stdin: ' + process.stdin.constructor.name);
(async () => {
let read;
read = fs.createReadStream(null, { fd: 0 })
for await (const r of read)
console.log(r)
console.log('EOF was (normally) reached.')
})();
but this leaves it in blocking mode:
const fs = require('fs');
(async () => {
let read;
read = fs.createReadStream(null, { fd: 0 })
for await (const r of read)
console.log(r)
console.log('EOF was (normally) reached.')
})();
The difference is accessing process.stdin.constructor.name
. Adding a simple console.error
doesn't change anything.
Easiest way to check if a file descriptor is in blocking or non-blocking mode is cat /proc/<pid>/fdinfo/<fd>
- look at flags
, the fourth digit from the right to the left will be 4 for non-blocking mode
@RomainMuller, please disregard what I said, it seems that Node has a problem with handling consistently stdin
across different situations and different platforms
@joyeecheung, the type of stdin
is reassigned in the process.stdin
getter??
If users are not supposed to be playing with 0, 1 and 2 file descriptors anymore, it should probably be mentioned, and maybe even add a warning
I have seen tons of examples with these, there is a widely used AsyncResource
tutorial which does writeSync(1, ...)
So there are three separate issues
stdin
be reassigned in the process.stdin
getter - meaning getting the value of process.stdin
has side effectsstdin
be a non-blocking tty on Linux - meaning that the 0, 1 and 2 fds become unusable for the userstdin
on Windows when used as a file, does not accept EPIPE
as a valid EOF
leading to @RomainMuller's original issueThe 2nd bullet point there has two tracking issues for it already (https://github.com/libuv/libuv/issues/2962 and https://github.com/libuv/libuv/issues/2062)
There are some examples I think have the same reason. Windows 7 via bash from Git for Windows, Node v13.14.0 works as expected:
$ echo test | node.exe -p 'require("fs").readFileSync(0)'
$ echo test | node.exe -e 'require("fs").readFile(0, console.log)'
doesn't works:
$ curl -s http://ifconfig.me/all.json | node.exe -p 'require("fs").readFileSync(0)'
<...>
Error: EOF: end of file, read
at Object.readSync (fs.js:578:3)
at tryReadSync (fs.js:353:20)
at Object.readFileSync (fs.js:390:19)
at [eval]:1:15
at Script.runInThisContext (vm.js:131:20)
at Object.runInThisContext (vm.js:297:38)
at Object.<anonymous> ([eval]-wrapper:10:26)
at Module._compile (internal/modules/cjs/loader.js:1118:30)
at evalScript (internal/process/execution.js:94:25)
at internal/main/eval_string.js:23:3 {
errno: -4095,
syscall: 'read',
code: 'EOF'
}
$ curl -s http://ifconfig.me/all.json | node.exe -e 'require("fs").readFile(0, console.log)'
<...>
[Error: EOF: end of file, read] {
errno: -4095,
code: 'EOF',
syscall: 'read'
}
In all cases constructor name of stdin is Socket
@joyeecheung @ronag @mcollina
There are three separate issues here:
The first and most important, is that stdio seems to change type based on an auto-detection code. I think that this is a design decision by @joyeecheung . If this is indeed the case, than the users need clarification on the 0, 1 and 2 file descriptors. As, depending on the type of the stdio (pipe, tty, file...), these can switch to non-blocking mode, they should not be used anymore as they are not compatible with the file operations of libuv. This should be made clear, especially since I found at least one popular tutorial on AsyncResource
which suggests that people make use of writeFileSync(1, ...)
in order to avoid the asynchronous operation of console.log
. Currently using the 0, 1 and 2 file descriptors will lead to completely random results, depending on the stdio's type and OS. This is not limited to writeFileSync
, all synchronous and asynchronous file operations that bypass the ReadStream
on these descriptors are affected.
Changing this requires rewriting lots of code.
The second one is that fact that the stdio reassignment takes places in the getter of process.stdin
/ process.stdout
. I think that this was an oversight when trying to lazy-load the ReadStream
. So user code will behave in a radically different manner if one adds a console.log(process.stdin.constructor.name)
as @RomainMuller did.
Fixing this is trivial.
The status of the third issue - that a pipe stdin
on Windows when used as a file, does not accept EPIPE
as a valid EOF
leading to @RomainMuller's original issue - depends on the decisions for the first two
The first and most important, is that stdio seems to change type based on an auto-detection code. I think that this is a design decision by @joyeecheung . If this is indeed the case, than the users need clarification on the 0, 1 and 2 file descriptors. As, depending on the type of the stdio (pipe, tty, file...), these can switch to non-blocking mode, they should not be used anymore as they are not compatible with the file operations of libuv. This should be made clear, especially since I found at least one popular tutorial on AsyncResource which suggests that people make use of writeFileSync(1, ...) in order to avoid the asynchronous operation of console.log. Currently using the 0, 1 and 2 file descriptors will lead to completely random results, depending on the stdio's type and OS. This is not limited to writeFileSync, all synchronous and asynchronous file operations that bypass the ReadStream on these descriptors are affected. Changing this requires rewriting lots of code.
This is not really a design decision but a necessity. STDIO can be of different types and the operations on the file descriptors MUST match those cases. I stumbled upon this quite a lot myself. Check out https://www.npmjs.com/package/sonic-boom. This handles most of the edge cases related to use fd
and writeFileSync
. Note that what looks like random results are in fact not random at all.
https://github.com/nodejs/node/blob/master/lib/internal/bootstrap/switches/is_main_thread.js contains all the logic that create a different implementation depending on the type of the file descriptors.
Note that you can find a lot of deeply incorrect information on blog posts about Node.js.
The second one is that fact that the stdio reassignment takes places in the getter of process.stdin / process.stdout. I think that this was an oversight when trying to lazy-load the ReadStream. So user code will behave in a radically different manner if one adds a console.log(process.stdin.constructor.name) as @RomainMuller did. Fixing this is trivial.
Can you please articulate this? A PR would be welcomed. It's important that we lazy-load things to speed up the boostrap of Node.js.
The status of the third issue - that a pipe stdin on Windows when used as a file, does not accept EPIPE as a valid EOF leading to @RomainMuller's original issue - depends on the decisions for the first two
This looks like a bug. PR?
@mcollina
Can you please articulate this? A PR would be welcomed. It's important that we lazy-load things to speed up the boostrap of Node.js.
For example, currently, on Linux when stdin
is a tty (ie without redirection), fd 0, 1 and 2 are blocking and support readFileSync(0, ...)
and readFile(0, ...)
. As soon as one accesses process.stdin
- for example with console.log(process.stdin.constructor.name)
, they switch to non-blocking mode and readFile(0, ...)
fails with EAGAIN
as non-blocking file descriptors are not supported by libuv. I don't think this is by design? Unless the design decision is that these descriptors are not to be used by the user code anymore which should be made clear? Maybe this is the best solution if these FDs are going go be handled behind the scenes, switching blocking mode on and off in a transparent way.
This looks like a bug. PR?
If the user is allowed to use FDs 0, 1 and 2, than yes, maybe. The problem is that when stdin
is a pipe, it is handled as a Socket
and libuv doesn't like it when one uses the file operations on it - this is the libuv issue I created - https://github.com/libuv/libuv/issues/3043 - but its status is not very clear as it is not clear whether this should be allowed or not
@mcollina , on Linux readFileSync('/dev/stdin')
works perfectly in all cases, but there is no Windows equivalent. The readStream
works perfectly in all cases too. Maybe the best decision is to simply disallow usage of 0, 1 and 2 instead of adding a huge amount of complexity - it already bad enough.
For example, currently, on Linux when stdin is a tty (ie without redirection), fd 0, 1 and 2 are blocking and support readFileSync(0, ...) and readFile(0, ...). As soon as one accesses process.stdin - for example with console.log(process.stdin.constructor.name), they switch to non-blocking mode and readFile(0, ...) fails with EAGAIN as non-blocking file descriptors are not supported by libuv.
It's totally possible to handle EAGAIN
with synchronous code. See https://github.com/mcollina/sonic-boom/blob/master/index.js#L94-L105 for more details. Note that the sleep operation that is performed is https://github.com/davidmarkclements/atomic-sleep/blob/b8149d3ca276c84a54fa8fa1478f9cc79aabc15a/index.js#L18.
@mcollina , on Linux readFileSync('/dev/stdin') works perfectly in all cases, but there is no Windows equivalent. The readStream works perfectly in all cases too. Maybe the best decision is to simply disallow usage of 0, 1 and 2 instead of adding a huge amount of complexity - it already bad enough.
I'll be -1 to that.
Ok, there is a solution to the non-blocking problem - I find it awful since it is essentially a way around the fundamentals of libuv and Node, but never mind, it is possible and it seems to be what people do - there is another example of this in my comments above
In this case the only thing that needs solving is the pipe closing on Windows - if this is to be allowed - and in this case this is a libuv problem which should support reading from pipes in file mode - https://github.com/libuv/libuv/issues/3043
@mcollina Just to make it clear - the official position on that one is that FD 0, 1 and 2 can be both blocking or non-blocking depending on the situation and the OS, and it is up to the user code to handle this?
Just to make it clear - the official position on that one is that FD 0, 1 and 2 can be both blocking or non-blocking depending on the situation and the OS, and it is up to the user code to handle this?
Yes.
I don't think this is by design? Unless the design decision is that these descriptors are not to be used by the user code anymore which should be made clear? Maybe this is the best solution if these FDs are going go be handled behind the scenes, switching blocking mode on and off in a transparent way.
I don't remember what the original behavior was but it wasn't by design. I vaguely remember (could be wrong) that we used to turn them non-blocking from the start and allowing them to be blocking until you access process.stdin
and friends should be just an artifact of lazy-loading them.
@joyeecheung, the current solution when it comes to FDs 0, 1 and 2 is absolutely horrible - ie they tend to transparently switch from blocking to non-blocking, requiring the user to reimplement the async read loop in his code by eventually microsleeping... However me too (and I didn't take @mcollina word for granted, I spent some time reading/testing) I came to the conclusion that there is simply no other solution. I still think that Node should not incite people to write horrible code, so this should not be official :smile: , but jokes apart, people are using these FDs, and most of the time they don't even realize something very weird is going on.
They only viable long-term solution I see is that libuv supports reading from regular files in non-blocking mode - something that would be totally useless except to give Node (and the user code) an uniform abstraction layer.
When it comes to the lazy-loading - at this point it doesn't change anything to not do it - the user code will still have to expect these FDs to be in either mode.
I will only try pushing a PR to libuv for the EOF
problem on Windows - now that we all agree that reading from a pipe in file mode should be supported.
If the confusing part is that the FD switches from blocking to non-blocking once you access process.stdin
etc. (due to lazy loading), I guess we could just do something early and make sure that the mode is consistently non-blocking for users (which is probably the old behavior).
Yes, it is exactly what made that issue so confusing for me. Changing this however this will have a detrimental performance effect on process creation for everyone and it won't really accomplish anything - the user will still need to support both blocking and non-blocking FDs
Changing it to non-blocking is often likely going to be problematic for any native C libraries that nodejs interacts with, as well as any programs that are execv from nodejs. Thus, I don't recommend doing that. It's also potentially quite problematic in the (very infrequent) case where some other external library calls dup2
to replace the stdio handles with something else.
Me too, as an old-school UNIX guy, I was initially appalled by the design decision to have a non-blocking stdio - it is something that is known to break things. But how do you implement pipes in libuv? They are currently built around net.Socket
and net.Socket
is always in non-blocking mode. And at this point it is not even about blocking or non-blocking - it is about it being sometimes blocking and sometimes non-blocking - which is the major problem. But having a consistent stdio requires that libuv fully supports both files, pipes and ttys in one of those modes - which is not the case. You are libuv, would you consider a PR that adds a completely useless support for blocking I/O on pipes/sockets/ttys (through threads) or non-blocking supports for files (that actually blocks and also requires threads)? And the only reason for doing it would be to have a consistent stdio in Node across all platforms? I am not sure it is worth the effort.
By the way, while researching this, I stumbled upon an effort by DJ Bernstein to standardize a new POSIX file API - that will have separate read_nonblocking
/ write_nonblocking
calls instead of a flag attached to the descriptor. There was even an attempt to implement it in Linux (readv2
/ writev2
calls) but the PR has been frozen for years.
the following patch appears to work (tested with piped curl)
fs.readSync = function(origReadSync) {
return function newReadSync() {
try {
return origReadSync.apply(this, arguments);
} catch (e) {
if (e.code === 'EOF')
return 0;
else
throw e;
}
};
}(fs.readSync);
@bughit it is a remarkably ugly solution, so I think it will be very appropriate to include it in the read loop with the microsleep 😄 @RomainMuller there is a work around for your issue, this is libuv, so the final fix won't happen before Node 16
Hi @RomainMuller, I wanted to look into this issue, but was not able to reproduce (tried 18.18.2, 20.10.0 and 22.0.0-pre). Do you think this can be closed now?
Since this is no longer reproducible, and there has been no answer since the last comment, I'll close this. If you still experience the same issue, please reopen this issue or open a new one.
fs
What steps will reproduce the bug?
Given the following node script:
Will behave differently based on how it is invoked:
How often does it reproduce? Is there a required condition?
This issue happens 100% of the time when
process.stdin
is aSocket
instance.What is the expected behavior?
The expected and documented behavior is that
fs.readSync
returns0
when EOF is reached.What do you see instead?
Instead, an
Error: EOF
is thrown, which is unexpected.Additional information
This problem was surfaced as users are starting to see failures on Windows at https://github.com/aws/aws-cdk/issues/11314.
We have additionally seen issues where in similar conditions (
process.stdin
is a socket/pipe), attempting to performfs.readSync
on file descriptor0
(aka STDIN), occasionally results in unexpectedEAGAIN
errors being thrown, which might be related. That was highlighted in https://github.com/aws/aws-cdk/issues/5187 and we are currently working with a pretty ugly workaround that would be nice if removed (but that other issue is a lot more difficult to steadily reproduce, as it does not always trigger, and may be caused by side-effects of another operation).