Closed 06fb27e5-30a6-4cf7-bb9f-8783afbf7693 closed 2 years ago
This is about a potential DOS vector that can get an asyncio server serving SSL connections to enter a busy loop and hang. To recover the server (python process) needs to be restarted.
See downstream report at https://github.com/spesmilo/electrumx/issues/92
ElectrumX is server software that serves JSON-RPC requests either directly over plaintext TCP or over SSL/TLS (no HTTP involved). The server code is written using asyncio. The most popular client is also written using python+asyncio. However, there are other server implementations (e.g. in Rust), and other clients (e.g. in javascript, scala, java). The servers are part of a federated network and peer with each other.
In the last 1+ year, server operators have started reporting that the python process (of ElectrumX) sporadically hangs: it uses 100% CPU and stops serving clients or accepting new connections. The only way we could find to recover is restarting the server (the python process); waiting for even days did not help, python was stuck. The hang seemed to mostly happen on busy servers that serve several thousands of concurrent sessions, and even on those only happened e.g. once a month. So the hang was difficult to reproduce.
Nevertheless we observed that the hang only happens if it is the asyncio server that handles SSL, i.e. if the server operator put nginx in front of the python process and handles SSL termination in nginx, they would be unaffected. One server operator whose server at one point hanged multiple times a day confirmed this, and reported that nginx subsequently started logging lines like this:
2021/01/11 02:28:30 [crit] 21#21: *437620 SSL_shutdown() failed (SSL: error:14094123:SSL routines:ssl3_read_bytes:application data after close notify) while proxying connection, client: REDACTED, server: 0.0.0.0:50002, upstream: "127.0.0.1:50213", bytes from/to client:81/205, bytes from/to upstream:205/0
Over these last months, the hang has been reproduced on many different python versions by different people, e.g. 3.7.1, 3.7.5, 3.8.5, 3.9.1, 3.9.4.
A few days ago, many servers hanged almost simultaneously, and when restarted, they would soon hang again at most a few hours later. Presumably someone either flooded the network with buggy clients, or it might have been an attack. Anyway, this allowed me to look into this better. I set up gdb and waited. This was on ubuntu 20.04 with python 3.8.5 and openssl=1.1.1f-1ubuntu2.3.
It seems the event loop thread is stuck in a busy loop. The deepest common stack frame looks to be at https://github.com/python/cpython/blob/v3.8.5/Lib/asyncio/sslproto.py#L675
(gdb) py-bt
Traceback (most recent call first):
File "/usr/lib/python3.8/asyncio/sslproto.py", line 535, in feed_appdata
File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog
ssldata, offset = self._sslpipe.feed_appdata(data, offset)
File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata
self._process_write_backlog()
File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write
self._ssl_protocol._write_appdata(data)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write
self._asyncio_transport.write(framed_message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message
await self.transport.write(message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request
await self._send_message(message)
<built-in method run of Context object at remote 0x7f09e30eb6c0>
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/lib/python3.8/asyncio/base_events.py", line 2627, in _run_once
--Type <RET> for more, q to quit, c to continue without paging--
File "/usr/lib/python3.8/asyncio/base_events.py", line 826, in run_forever
None, getaddr_func, host, port, family, type, proto, flags)
File "/usr/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
self.run_forever()
File "/usr/lib/python3.8/asyncio/runners.py", line 299, in run
File "/usr/local/bin/electrumx_server", line 35, in main
asyncio.run(controller.run())
File "/usr/local/bin/electrumx_server", line 43, in <module>
main()
(gdb) py-list
530 except (SystemExit, KeyboardInterrupt):
531 raise
532 except BaseException as e:
533 self._fatal_error(e, 'SSL error in data received')
534 return
>535
536 for chunk in ssldata:
537 self._transport.write(chunk)
538
539 for chunk in appdata:
540 if chunk:
(gdb) py-locals
self = <_SSLPipe(_context=<SSLContext at remote 0x7f0a30dc00c0>, _server_side=True, _server_hostname=None, _state='WRAPPED', _incoming=<_ssl.MemoryBIO at remote 0x7f09cc0925b0>, _outgoing=<_ssl.MemoryBIO at remote 0x7f09b919c430>, _sslobj=<SSLObject(_sslobj=<_ssl._SSLSocket at remote 0x7f09cc8c1630>) at remote 0x7f09db428910>, _need_ssldata=False, _handshake_cb=<method at remote 0x7f09c0177a00>, _shutdown_cb=None) at remote 0x7f09db43f160>
data = b'{"jsonrpc": "2.0", "result": null, "id": 18}\n'
offset = 0
ssldata = [b'\x17\x03\x03\x00\x13<\x8fh\x93\xd3\xd7M\xb9\xbd\xb6\xad\x08\x1bZ\x857\x0c\xe4\xac']
view = <memoryview at remote 0x7f09d295be80>
exc_errno = 5
(gdb) cont
Continuing.
^C
Thread 1 "electrumx_serve" received signal SIGINT, Interrupt.
memory_getbuf (self=0x7f09b69a3880, view=0x7ffcb6829200, flags=0)
at ../Objects/memoryobject.c:1441
1441 ../Objects/memoryobject.c: No such file or directory.
(gdb) py-bt
Traceback (most recent call first):
File "/usr/lib/python3.8/ssl.py", line 897, in write
return self._sslobj.write(data)
File "/usr/lib/python3.8/asyncio/sslproto.py", line 262, in feed_appdata
offset += self._sslobj.write(view[offset:])
File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog
ssldata, offset = self._sslpipe.feed_appdata(data, offset)
File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata
self._process_write_backlog()
File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write
self._ssl_protocol._write_appdata(data)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write
self._asyncio_transport.write(framed_message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message
await self.transport.write(message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request
await self._send_message(message)
<built-in method run of Context object at remote 0x7f09e30eb6c0>
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
--Type <RET> for more, q to quit, c to continue without paging--
self._context.run(self._callback, *self._args)
File "/usr/lib/python3.8/asyncio/base_events.py", line 2627, in _run_once
File "/usr/lib/python3.8/asyncio/base_events.py", line 826, in run_forever
None, getaddr_func, host, port, family, type, proto, flags)
File "/usr/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
self.run_forever()
File "/usr/lib/python3.8/asyncio/runners.py", line 299, in run
File "/usr/local/bin/electrumx_server", line 35, in main
asyncio.run(controller.run())
File "/usr/local/bin/electrumx_server", line 43, in <module>
main()
(gdb)
(gdb) cont
Continuing.
^C
(gdb) py-bt
Traceback (most recent call first):
<built-in method len of module object at remote 0x7f0a34f8e0e0>
File "/usr/lib/python3.8/asyncio/sslproto.py", line 535, in feed_appdata
File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog
ssldata, offset = self._sslpipe.feed_appdata(data, offset)
File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata
self._process_write_backlog()
File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write
self._ssl_protocol._write_appdata(data)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write
self._asyncio_transport.write(framed_message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message
await self.transport.write(message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request
await self._send_message(message)
<built-in method run of Context object at remote 0x7f09e30eb6c0>
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
--Type <RET> for more, q to quit, c to continue without paging--
File "/usr/lib/python3.8/asyncio/base_events.py", line 2627, in _run_once
File "/usr/lib/python3.8/asyncio/base_events.py", line 826, in run_forever
None, getaddr_func, host, port, family, type, proto, flags)
File "/usr/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
self.run_forever()
File "/usr/lib/python3.8/asyncio/runners.py", line 299, in run
File "/usr/local/bin/electrumx_server", line 35, in main
asyncio.run(controller.run())
File "/usr/local/bin/electrumx_server", line 43, in <module>
main()
main()
(gdb) cont
Continuing.
^C
Thread 1 "electrumx_serve" received signal SIGINT, Interrupt.
PySlice_New (start=0, stop=None, step=0x0) at ../Objects/sliceobject.c:123
123 ../Objects/sliceobject.c: No such file or directory.
(gdb)
Continuing.
^C
Thread 1 "electrumx_serve" received signal SIGINT, Interrupt.
0x00000000005d4390 in long_richcompare (
self=<SSLErrorNumber(_value_=5, _name_='SSL_ERROR_SYSCALL', __objclass__=<EnumMeta(_generate_next_value_=<function at remote 0x7f0a34e119d0>, __doc__='An enumeration.', __module__='ssl', _member_names_=['SSL_ERROR_SSL', 'SSL_ERROR_WANT_READ', 'SSL_ERROR_WANT_WRITE', 'SSL_ERROR_WANT_X509_LOOKUP', 'SSL_ERROR_SYSCALL', 'SSL_ERROR_ZERO_RETURN', 'SSL_ERROR_WANT_CONNECT', 'SSL_ERROR_EOF', 'SSL_ERROR_INVALID_ERROR_CODE'], _member_map_={'SSL_ERROR_SSL': <SSLErrorNumber(_value_=1, _name_='SSL_ERROR_SSL', __objclass__=<...>) at remote 0x7f0a3487fe00>, 'SSL_ERROR_WANT_READ': <SSLErrorNumber(_value_=2, _name_='SSL_ERROR_WANT_READ', __objclass__=<...>) at remote 0x7f0a3487fe80>, 'SSL_ERROR_WANT_WRITE': <SSLErrorNumber(_value_=3, _name_='SSL_ERROR_WANT_WRITE', __objclass__=<...>) at remote 0x7f0a3487ff00>, 'SSL_ERROR_WANT_X509_LOOKUP': <SSLErrorNumber(_value_=4, _name_='SSL_ERROR_WANT_X509_LOOKUP', __objclass__=<...>) at remote 0x7f0a3487ff80>, 'SSL_ERROR_SYSCALL': <...>, 'SSL_ERROR_ZERO_RETURN': <SSLErrorNumber(_value_=6, _nam...(truncated), other=5, op=2) at ../Objects/longobject.c:3083
3083 ../Objects/longobject.c: No such file or directory.
(gdb) py-bt
Traceback (most recent call first):
File "/usr/lib/python3.8/asyncio/sslproto.py", line 526, in feed_appdata
return
File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog
ssldata, offset = self._sslpipe.feed_appdata(data, offset)
File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata
self._process_write_backlog()
File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write
self._ssl_protocol._write_appdata(data)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write
self._asyncio_transport.write(framed_message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message
await self.transport.write(message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request
await self._send_message(message)
<built-in method run of Context object at remote 0x7f09e30eb6c0>
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/lib/python3.8/asyncio/base_events.py", line 2627, in _run_once
--Type <RET> for more, q to quit, c to continue without paging--
(gdb)
(gdb) cont
Continuing.
^C
Thread 1 "electrumx_serve" received signal SIGINT, Interrupt.
0x00000000004f7fa0 in builtin_len (module=<optimized out>, obj=<optimized out>)
at ../Objects/longobject.c:61
61 ../Objects/longobject.c: No such file or directory.
(gdb) py-bt
Traceback (most recent call first):
<built-in method len of module object at remote 0x7f0a34f8e0e0>
File "/usr/lib/python3.8/asyncio/sslproto.py", line 535, in feed_appdata
File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog
ssldata, offset = self._sslpipe.feed_appdata(data, offset)
File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata
self._process_write_backlog()
File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write
self._ssl_protocol._write_appdata(data)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write
self._asyncio_transport.write(framed_message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message
await self.transport.write(message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request
await self._send_message(message)
<built-in method run of Context object at remote 0x7f09e30eb6c0>
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
--Type <RET> for more, q to quit, c to continue without paging--
(gdb) cont
Continuing.
^C
Thread 1 "electrumx_serve" received signal SIGINT, Interrupt.
PyType_IsSubtype (b=0x907020 <_PyNone_Type>, a=0x90a860 <PyUnicode_Type>)
at ../Objects/typeobject.c:1370
1370 ../Objects/typeobject.c: No such file or directory.
(gdb) py-bt
Traceback (most recent call first):
File "/usr/lib/python3.8/asyncio/sslproto.py", line 268, in feed_appdata
if exc.reason == 'PROTOCOL_IS_SHUTDOWN':
File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog
ssldata, offset = self._sslpipe.feed_appdata(data, offset)
File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata
self._process_write_backlog()
File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write
self._ssl_protocol._write_appdata(data)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write
self._asyncio_transport.write(framed_message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message
await self.transport.write(message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request
await self._send_message(message)
<built-in method run of Context object at remote 0x7f09e30eb6c0>
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/lib/python3.8/asyncio/base_events.py", line 2627, in _run_once
--Type <RET> for more, q to quit, c to continue without paging--
(gdb) cont
Continuing.
^C
Thread 1 "electrumx_serve" received signal SIGINT, Interrupt.
__GI___errno_location () at errno-loc.c:25
25 errno-loc.c: No such file or directory.
(gdb) py-bt
Traceback (most recent call first):
File "/usr/lib/python3.8/ssl.py", line 897, in write
return self._sslobj.write(data)
File "/usr/lib/python3.8/asyncio/sslproto.py", line 262, in feed_appdata
offset += self._sslobj.write(view[offset:])
File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog
ssldata, offset = self._sslpipe.feed_appdata(data, offset)
File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata
self._process_write_backlog()
File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write
self._ssl_protocol._write_appdata(data)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write
self._asyncio_transport.write(framed_message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message
await self.transport.write(message)
File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request
await self._send_message(message)
<built-in method run of Context object at remote 0x7f09e30eb6c0>
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
Note that around line https://github.com/python/cpython/blob/v3.8.5/Lib/asyncio/sslproto.py#L262,
exc_errno = 5 corresponds to ssl.SSL_ERROR_SYSCALL
.
Considering the gdb backtraces above, it seems to me the execution is stuck in that while loop, always getting ssl.SSL_ERROR_SYSCALL
and suppressing it.
Note that this code is not async, so the whole event loop is blocked.
I have patched my python interpreter to not suppress that exception but instead re-raise it, and recompiled; and the hangs have gone away; I can no longer reproduce them.
I used the following patch on top of 3.9.4 (but note the code in this file has not changed for years):
Lib/asyncio/sslproto.py | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/Lib/asyncio/sslproto.py b/Lib/asyncio/sslproto.py
index cad25b2653..f67898a777 100644
--- a/Lib/asyncio/sslproto.py
+++ b/Lib/asyncio/sslproto.py
@@ -267,8 +267,7 @@ class _SSLPipe(object):
if exc.reason == 'PROTOCOL_IS_SHUTDOWN':
exc_errno = exc.errno = ssl.SSL_ERROR_WANT_READ
if exc_errno not in (ssl.SSL_ERROR_WANT_READ,
- ssl.SSL_ERROR_WANT_WRITE,
- ssl.SSL_ERROR_SYSCALL):
+ ssl.SSL_ERROR_WANT_WRITE):
raise
self._need_ssldata = (exc_errno == ssl.SSL_ERROR_WANT_READ)
Frankly I don't understand why the ssl.SSL_ERROR_SYSCALL
error is special cased in the try-except, hence this might break something else. The git blame or the python docs were of no help either (to understand that exception). However with this patch my server seems to be working correctly and is no longer getting stuck.
Unfortunately I don't have code that triggers the hang.
Could you check Python 3.11? It has a new asyncio SSL implementation rewritten from scratch.
Closing as 3.11 has rewritten ssl implementation and this is out of date now. Please create a new issue if this issue still exists on supported python versions along with a reproducer.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-security', '3.7', '3.8', '3.9', 'expert-asyncio']
title = 'asyncio SSL server can be DOSed, event loop gets blocked: busy loops and uses 100% CPU'
updated_at =
user = 'https://github.com/SomberNight'
```
bugs.python.org fields:
```python
activity =
actor = 'asvetlov'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['asyncio']
creation =
creator = 'ghost43'
dependencies = []
files = []
hgrepos = []
issue_num = 44036
keywords = []
message_count = 2.0
messages = ['392923', '415997']
nosy_count = 5.0
nosy_names = ['christian.heimes', 'asvetlov', 'yselivanov', 'kyuupichan', 'ghost43']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue44036'
versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']
```