Closed ns476 closed 1 year ago
Interesting. Enabling logs might help debug. Though, I don't have access to a Mac.
I've got a Mac and had some free time this morning.
I ran @ns476 program on the following platforms with the following tests
π¨π»βπ¬ | apache bench | k6 |
---|---|---|
MacOS (arm64) | β | β |
Linux (container) | β | β |
Windows (windows 10) | β | β |
I found similar results. MacOS was stalling out in the 16k~ request range while windows and linux finished the test without issue. I also found that re-using connections (or enabling keep-alive) resulted in a successful test on all platforms.
Next, I threw together an equivalently basic web server with dotnet 7 (the mvc controllers api template) and node express js. All of my test results were the same. MacOS was stalling out without the ability to re-use its connections.
In all scenarios without reusable connections, you can even:
Impression from this testing is it seems like an issue with the network stack outside of hyper
on Mac.
That magic 16k~ number is no coincidence, and neither is the fact that after exactly 30 seconds the test completed another 16k~ requests before stalling again.
On macOS the default ephemeral port range is 49152 to 65535, for a total of 16384 ports.^1
When a TCP connection is closed from the server side, the port doesnβt immediately available to be used because the connection will first transits into TIME_WAIT state ... By default, MacOS have a msl time of 15 seconds. Hence, according to the specs, the connection will have to wait around 30 seconds before it can transits into CLOSED state.^2
While waiting for the Apache Bench to finish, you can witness the TIME_WAIT requests (16k of them) with this command:
netstat -p tcp -n | grep TIME_WAIT
Interestingly, while running the test on linux and using an equivalent check:
netstat -antu | grep TIME_WAIT
You can see that there are just as many connections in a TIME_WAIT
state, but Linux doesn't have an issue determining that the request is actually finished and forgoes the default 60 second timer (double the length that MacOS makes you wait regardless of the requests state).
You can manually edit your systems TCP behavior on mac like this^2:
sudo sysctl net.inet.tcp.msl=1000
This chews through 16k requests every 2 seconds or so (although it is not recommended to edit your system in this way).
You can also increase your available portrange for TCP connections like this^3:
$ sudo sysctl -w net.inet.ip.portrange.first=32768
net.inet.ip.portrange.first: 49152 -> 32768
This allows you to get through 32k~ at a time before you stall out.
Extra reading^4
This isn't hyper
's issue, but I'm not entirely convinced that it has no agency in the matter. What it could do exactly I'm not sure, but it doesn't seem to be a problem that any of the other frameworks I tested address out-of-the-box either^5.
Superb research and write-up, thank you! In this case, I'm going to close as not a problem with hyper. If there's a simple thing that is found to work in the future, perhaps we can do that.
Version Hyper 0.14.25 / Tokio 1.26.0
Platform MacOS 13.2.1
Description The following minimal server hangs eventually on MacOS when I make requests with keepalives disabled:
I can trigger the issue with ApacheBench:
It also occurs with k6 if I disable keepalives in the server with
.http1_keepalive(false)
.Exactly the same program works fine on Linux so I am fairly confident this is MacOS specific.