Open kylemacfarlane opened 6 years ago
Bonus: this seems to have also fixed #1602 (which may be the same as #1623).
websocket_send
. At first glance that seems like the exact opposite problem of receiving multiple small requests but it's easy to see how it can become the same.A large response that takes a long time to send is more likely to overlap with other requests or responses. Thus again it's probably not heavy load per se but multiple messages being received or sent at almost the same time which causes all these bugs.
I noticed in #1602 I said it only happens behind nginx. That's not true for this bug (#1716) which happens with and without nginx.
In above (failed) tests did you run uWSGI with --http
, --http-socket
or --socket
?
In the tests I was hitting uWSGI directly so it would most likely have been --http.
In production it's behind nginx and uses --socket.
The problem appears in both places when using websocket_recv_nb.
How did you run the tests? Could you share a testing script?
The problem is at least partially repeatable with the demos in the tests folder.
Using this demo: https://github.com/unbit/uwsgi/blob/master/tests/websockets_chat_async.py
First you need to update it for current versions of uwsgi (change websocket_handshake) and then add a setInterval to repeatedly send the message: https://gist.github.com/kylemacfarlane/841efeec92d63d2396dc0cab0d2a4e85
Note that for this demo I had to set the interval to 5ms / 200rps whereas in my real project 50ms / 20rps is enough. Probably something to do with message sizes and/or network latency.
Run it with:
uwsgi --shared-socket 0.0.0.0:443, --https =0,cert,key --http-websockets --wsgi-file websockets_chat_async.py --async 100 --ugreen
Access the page and type something in the message box so that it starts sending. Within a few minutes (perhaps even seconds) the websocket connection will drop with Bad file descriptor
either from wait_fd_read or uwsgi_response_write_body_do.
I am interested in gevent version so I run server app:
uwsgi --http :12345 --http-websockets --wsgi-file websockets_chat.py --gevent-monkey-patch --gevent 100
and in Chrome browser in console I run:
setInterval(invia, 5)
I don't observe any errors after some minutes of running this test (over 10000 websocket messages so far)...
I run this version of uWSGI:
*** Starting uWSGI 2.0.17.1 (64bit) on [Mon Nov 19 10:16:02 2018] ***
compiled with version: 5.4.0 20160609 on 06 September 2018 13:40:59
os: Linux-4.15.0-39-generic #42-Ubuntu SMP Tue Oct 23 15:48:01 UTC 2018
gevent.select on stable is the hardest method to make fail. In the tests above I noted that it takes 5-15 minutes at 20rps over the real network so in a simple test on the same computer you'd probably have to leave it running for ages.
If you put a uWSGI websocket server using
websocket_recv_nb
as shown intests/websockets_chat.py
under heavy load then a couple errors start appearing. On the other hand usingwebsocket_recv
insidegevent.spawn
and applying gevent monkeypatching as shown intests/websockets_chat_2.py
appears to work fine.First of all I don't think it's caused by load per se. I think the problem is due to messages arriving at the same time which just happens more frequently under heavy load. To recreate the issue I hit the server with two messages at the same time every 100ms.
First Symptom
You will encounter "delayed" messages as reported in #1241. To solve this you can simply iterate over websocket_recv_nb until it returns None.
Second Symptom
You will start getting this error when reading from the websocket:
This was reported in #1533. How can there be no PONG response when the client and server are talking at 20 messages per second? Even aggressively reading from the websocket as much as possible doesn't solve the problem.
Third Symptom
You will start getting this error when reading from Redis:
This was reported on StackOverflow and the uWSGI maillist:
Other Errors
If you start playing around there are other errors that can happen. For example if you suppress the OSError from Redis then next time you try to read the websocket you will get an fd error from that as well. But the above are the three errors you will most commonly encounter if you copy the examples in the tests directory: https://github.com/unbit/uwsgi/tree/master/tests
Testing
I experimented with various settings and the following are the outcomes. I didn't try asyncio because the docs say it's experimental.
uwsgi.wait_fd_read
based on tests/websockets_chat_async.py--async 100 --ugreen
.
uwsgi.async_sleep
instead of waiting for any fds--async 100 --ugreen
.
gevent.select
based on tests/websockets_chat.py--gevent 100 --gevent-monkey-patch
.
uwsgi.wait_fd_read
based on tests/websockets_chat_async.py--async 100 --ugreen
.
gevent.select
based on tests/websockets_chat.py--gevent 100 --gevent-monkey-patch
KeyError: 'env'
SystemError: <built-in function uwsgi_gevent_request> returned a result with an error set
.
gevent.spawn
andrequest_context
loosely based on tests/websockets_chat_2.pyuwsgi.websocket_recv_nb
andredis.get_message
combined withgevent.sleep
--gevent 100 --gevent-monkey-patch
.
gevent.spawn
andrequest_context
loosely based on tests/websockets_chat_2.pyuwsgi.websocket_recv
andredis.listen
and hope that gevent monkeypatching prevents any actual blocks--gevent 100 --gevent-monkey-patch
.
gevent.spawn
andrequest_context
uwsgi.websocket_recv
+gevent monkeypatch
with non-blockingredis.get_message
+gevent.sleep
uwsgi.websocket_recv_nb
+gevent.sleep
with "blocking"redis.listen
+gevent monkeypatch
websocket_recv_nb
.