Open ORESoftware opened 1 year ago
I noticed that v16 and v18 experience ECONNRESET (errno -54) but node v20 doesn't, with my code, which I am happy to share.
Do you have a minimal test case, no external dependencies?
yes I do - I discovered this: I have a node client and a node server. If I keep the node server at 16 or 18 and upgrade the node client to 20, it works fine. So the node server doesnt seem to be affected by versions. So that's good news. The node client is using the request library to make about 100 concurrent requests at a time, 50000 total. The server uses express. Very clear change when the client uses 18 vs 20, 20 is a marked improvement. It will still ECONNRESET under loads, but it's much more tolerant/performant.
I'm also encountering errors with Next.js 14.
Perfectly accurate but unhelpful advice from another thread about ECONNRESET:
ECONNRESET means the peer closed the connection (i.e., a fact of life you simply need to deal with) and should be handled by adding 'error' event listeners in the right places.
The real issue here isn't the ECONNRESET. It's the fact that the error object does not contains enough information to identify the "right places" to add the needed listeners.
For example, this is the entire call stack for the one that periodically restarts my service:
Error: read ECONNRESET
at TLSWrap.onStreamRead (node:internal/stream_base_commons:217:20)
at TLSWrap.callbackTrampoline (node:internal/async_hooks:130:17)
I suspect that those worthless call stacks are not a nodejs problem, but rather a Javascript runtime problem. But whether or not that's the case, could nodejs do anything to make these call stacks useful?
For example, provide default error listeners that just log information about which listener caught the error before terminating the process. That would give us enough information to actually use the advice that I quoted above.
With no stack trace in the exception, I have no idea where to catch this exception in my code.
Hi! Can someone provide an example snippet to reproduce this without any dependencies?
With no stack trace in the exception, I have no idea where to catch this exception in my code.
Turn it around: if you don't add an error listener, how is node supposed to know where you forgot to do that? It's not omniscient.
I suppose that - in theory - node could warn if your 'connection' listener didn't add an error listener to the connection/request/response object when it returns. What I mean is this:
net.createServer(function connection(c) {
// oops, forgot to `c.on("error", cb)` - node warns when connection() returns
})
I wish anyone wanting to pursue that godspeed dealing with the inevitable ecosystem fallout.
I already checked all of my code to make sure every 'connection' listener immediately adds an 'error' listener. I'm continuing to blindly change things to see if anything fixes the issue, but I'm actually not sure whether the mistake is made in my code or in a dependency/library that my code uses—I'm also looking at websockets/ws to see if it might be in there, but I can't open an issue if I have no idea where or if the mistake exists.
From my perspective, if there is a choice between crashing my server application at startup and providing information on where the mistake exists, versus crashing my server application at runtime and disconnecting all clients for a few minutes while the highly-stateful application restarts, I would clearly prefer crashing at startup. It's better to assume that an error will eventually occur, so the lack of a handler will eventually result in a crash, so might as well do it immediately and provide useful information about where the mistake is.
Anyway, I just tried upgrading from Node 18 to 22 as some comments on this issue suggest, and I will see whether that makes the application stop occasionally crashing. Thanks for taking the time to respond to my previous comment.
Version
18 vs 20
Platform
macos
Subsystem
No response
What steps will reproduce the bug?
On Node 18, I get this a lot (with no errors logged):
on Node v20, I don't seem to see the above error (-54)
How often does it reproduce? Is there a required condition?
perhaps if server gets overwhelmed?
What is the expected behavior? Why is that the expected behavior?
expect to not get an error with no error trace in console (no clear cause)
What do you see instead?
N/A
Additional information
N/A