Open cuklev opened 3 years ago
The server can't respond with a 429 in that case, since it cannot accept the new connection at all. I'd strongly advise bumping the FD limit, 1024 is far too low for a busy server.
Well, it is not necessarily a busy server. It could be just someone trying to abuse it. In my case, I was surprised that my server process exited. I feel like bumping the FD limit is only a temporary solution.
@cuklev Could you please provide client-side code you're invoking?
while :; do curl -s http://localhost:3003 > /dev/null & done
in bash.
Seems that you're running out of sockets/FDs. ulimit -Sn
will show current value of FDs.
Application cannot allocate more than ulimit -Sn
sockets and simple refuses to respond since you're forcing it to wait for 3 seconds for every single query. Warp throws an error, since it could not allocate more.
I do not know what is the best strategy for the socket exhaustion fault tolerance here. Maybe add allocation counter, threshold and/or queue and to change its strategy when threshold is reached to schedule responses into the queue and process them separately.
As of now, you could go ahead and set soft/hard limits per user/application on system level based on expected/predicted RPS from clients/proxy.
Yes, increasing the FDs limit will improve the situation but it will not solve it. Warp should definetely catch that error and not exit.
I tested the same setup but with nginx in the middle, using proxy_pass
to the Haskell server. In that case, my application never crashes. Nginx responds with 500 for half of the requests.
Consider curl
case for simplicity.
3003
(1 FD).curl
s.curl
defaults, each "client" will wait for accept
from server up to 60 seconds and for connect
up to 300 seconds. curl
.Network.Socket.accept: resource exhausted (No file descriptors available)
.According to current warp
implementation, there should be appropriate design fix for leaking connections in case of accepting them. I am currently investigating leaking side of the story.
Let's return to the nginx.
With nginx
there are a lot of variables that should be taken into account:
NGINX + Warp + /etc/sysctl.conf
should be configured extremely careful, there should be no contradictions for all possible combinations of parameters mentioned above.
E.g. decreasing proxy_read_timeout
and proxy_send_timeout
on NGINX side could fix warp availability in particular use case.
Another example is to remove keepalive
from your upstream configuration. It could also help in different use case.
I think it should be possible to not let the application crash, and just print to stdout/stderr that no file descriptors were available, and just continue with the loop?
The Network.Socket
error is just an IOError
with OtherError
and a string, so it should be easy, although pretty frail, so let's hope Network.Socket
doesn't change it's exception's syntax 🙃
Might be related to #603
Here is a sample code:
When I run something like
while :; do curl -s http://localhost:3003 > /dev/null & done
the Haskell program receivesNetwork.Socket.accept: resource exhausted (Too many open files)
and then exits successfully after all connections close. It always happens after printing 1011 for me. This is because each accepted connection is a new open file and there is a limit to open files per process. On my system this limit seems to be 1024 (can be seen or changed withulimit -Sn
).I am not sure how this thing should be solved. Should warp not accept connections when there are too many that have been opened? Should accepting be allowed to fail and retry after that? Should the server respond with something like 429 Too Many Requests?