Open davidje13 opened 5 years ago
See discussion in https://github.com/websockets/ws/issues/1635. I think it's an ephemeral port exhaustion / connections kept in CLOSE_WAIT state issue on macOS.
Not sure if there is anything actionable to do.
@lpinca I disagree that it's connected to your other issue for several reasons:
Connection: Upgrade
Interestingly, since I installed the "macOS Mojave 10.14.6 Supplemental Update 2" after posting this issue, I am unable to replicate it (though I can still replicate lpinca's issue). That might be a coincidence, but since it was trivial to replicate before I'm inclined to believe it may have been patched at an OS level. I will post again if it reappears.
this only manifests if the request is the first to reach the server (or perhaps is the first to be sent to a particular port)
How did you verify this? In the tests you shared, a lot of servers and clients are started until the issue happens or did I miss something?
this only manifests when using
Connection: Upgrade
Are you 100% sure about this? If there is a bug in Node.js upgrade mechanism why the issue disappeared with the latest macOS update and does not manifest on Linux or Windows?
@lpinca I verified that it only applies to the first request using a modified (not posted) script, which creates a single server then fires requests repeatedly to the same server. This was part of my original investigation, before I had narrowed down the issue. I observed your issue in that situation, but not mine.
And I'm not 100% sure about Connection: Upgrade
being involved, but I am 98% sure. From some quick testing before I was always able to reliably reproduce the issue with the Connection
header set, but if I removed all headers the issue did not appear, at least in the few quick runs I performed (given how frequently the issue appeared, a few clean runs is enough to give me confidence).
I have no idea whether this issue manifests on Linux or Windows, and I can't be sure that it's fixed after the update (it just appears to be gone for now, but maybe that's due to some other state which I'm unaware of; another change which has happened is that I recently installed Redis, for example). It could also be related to how long the machine is switched on.
I tried to investigate the node upgrade mechanism but quickly got lost in the ClientRequest code, so I can't currently offer any insight into why it might trigger issues in particular cases. It seems likely that it is due to an OS bug which is being triggered by a particular request from node, but it could also be something not being properly cleaned up by node, or a listener race condition. All speculation though.
On an M1 Mac Mini running macOS 11.2.3 and Node.js 16.1.0.
I'm not sure it's Upgrade
requests, is it an IPv6/IPv4 problem?
I've run into this same issue it seems. I was getting intermittent timeouts in a test suite and wasn't able to spot any obvious race conditions. My project is a simple web tunnel server, like localtunnel
for piping public HTTP/WebSocket traffic on a remote host through open TCP connections to a local host.
I tried out setting my server's host to '127.0.0.1'
as suggested in the first comment. That works. Fast connections and no failures during 8000 iterations of start server -> request Upgrade -> assert on response -> stop server
, rerun a couple (2 or 3) times.
I also tried explicitly setting the server's host to 0.0.0.0
IPv4 all interfaces and that worked too.
By default, my http.createServer().listen(cb)
uses ::
IPv6 all interfaces as the server host.
I suspected it was an IPv6 issue. Configuring http.request()
use family: 6
also made the issue go away. 💡
I tested Upgrade requests with bare TCP sockets as suggested above. The problem was much less frequent, but still happened. net.connect()
uses IPv4 by default on my system. Switching them to family: 6
seemed to eliminate the problem again.
Last, I tried sending normal GET
requests without the Upgrade
header. I left the IP families mismatched. Much less frequent issues again. But still failed. Again, configured family: 6
for http.request()
. Worked fine.
I have no idea why translating from IPv4 to IPv6 would cause requests to fail or Upgrade
s to be ultra-shitty, but if on Windows/Linux all default IP families match by default, that may explain the lack of reproducibility on those systems. 🤷♀️
@davidje13, is it possible that your issue was coming up due to a mismatch in IP family? Did you confirm that your server was defaulting to 0.0.0.0
and not ::
?
@thebearingedge it's possible / likely that it was using ::
rather than 0.0.0.0
; I didn't think to check IPv4 vs IPv6 and I see no reason why it wouldn't use IPv6 on the machine I was using.
btw I don't know if this necessarily fails to reproduce on Linux/Windows, just that nobody's reported seeing it on those systems yet.
Connecting to a newly started
http.Server
with ahttp.ClientRequest
usingConnection: Upgrade
occasionally (~0.2% probability) hangs indefinitely. The request never reaches the server (noconnection
event is fired), but strangely the same issue does not appear when using a rawnet.Socket
on the client and sending equivalent data. The issue also does not appear if the server is bound to127.0.0.1
rather than the default0.0.0.0
.The following code replicates the issue reliably; typically it will freeze after ~500 iterations:
Changes which make the issue disappear:
server.listen
call toserver.listen(0, '127.0.0.1', resolve)
http.ClientRequest
block to:server.on('request', (req, res) => res.end());
)Changes which do not make the issue disappear:
server.listen
and the request (tested with up to 1 second)When the issue appears, even the server-side
connection
event does not fire.This was found while debugging a suite of tests with occasional failures. Initially reported to https://github.com/websockets/ws/issues/1635