Closed cipig closed 2 years ago
Any idea about specific commits? Also, to be clear, does it also happen with current HEAD (7e53936f34d2f1c979c64f91d15d6745fb8103c2)?
no idea, but must be one of the last 10 commits, after Jan 28, after https://github.com/spesmilo/electrumx/commit/3f12f288a442d4c3429de0a813c6355a0132633f i am using latest master branch, but not 100% sure that all servers were already restarted with this version, some may be missing the last 3 commits... do you think the memleak might have been introduced after Jan 28, but already fixed in the last 3 commits? i will restart everything now to be sure all servers are on latest master
well this commit https://github.com/spesmilo/electrumx/commit/c5d1e802e7a7e98db36fbad79954b3a46b9e03f3 might be leaking memory, but then I would have thought that the next one (https://github.com/spesmilo/electrumx/commit/b10d03caf639c52a97322d92098a4243ac6091fb) fixed that
I doubt the last 3 commits are relevant :/
all servers are running at least versions that have both those commits in the meantime i am sure the memleak is still there... a server running latest master was restarted yesterday and the electrumx server already uses 12GB RAM (used to be 5)
I don't think it is a commit causing the memory leak. My server is impacted by the memory leak since the DDOS against it started. I think the memory leak is caused by the large amount of attempted connections by the botnet.
i run couple dozen different coins, some have a very low number of connections, but they are all affected
peer discovery is also off on all coins, PEER_DISCOVERY = self
i could try to go back to an older version, before the python 3.8 change... if the problem is not solved i will have to, since this version will crash my server sooner or later with OOM (automatic restart of all coins every week will not be enough to compensate for the memleak)
drawback is that i then have the random crashes back (which were solved in the last commits), so i will wait couple days more before reverting to older version
the situation is the same
coins that were using 5GB RAM before are now using 30GB, till restarted once a week
will downgrade to https://github.com/spesmilo/electrumx/commit/0256e97715f16730be7293d5f8d893bb476315dc and see what happens
My servers started showing this same leakage when I updated from ubuntu 18.04 (python 3.7) to 20.04 (python 3.8). I suspect Python 3.8 is somehow missing a fix from 3.7.
@shsmith do you mean that with same electrumx commit some python versions have a memleak but others not?
Yes, that is what I think happened. After the OS+python upgrade the leak started. I did the upgrade on two servers on 2/19/2022. I may have done a get pull on the same day, but I think the leak was triggered by the python version change rather than any recent git commit.
the last 2 "hills" are from https://github.com/spesmilo/electrumx/commit/0256e97715f16730be7293d5f8d893bb476315dc
looks like the leak is not that big... unfortunately this version crashes on websockets requests and need to be restarted, so the chart is not for an entire week.
bigger picture
the last 3 hills are with https://github.com/spesmilo/electrumx/commit/0256e97715f16730be7293d5f8d893bb476315dc
guess it's mainly related to OS+python 3.9... it all started with that here too... funny is that i did the OS update to get the newer python version to be able to run latest electrumx (previous debian had python 3.7)
could it be related to some python modules? i would suspect aiorpcx, that was the "big" update... i have 0.22.1, the latest one
this is my config for this particular coin
COIN = Komodo
DB_DIRECTORY = /electrumdb/KMD
DAEMON_URL = http://xxxxx:xxxxxx@127.0.0.1:7771/
SERVICES = tcp://:10001,rpc://:8001,ssl://:20001,wss://:30001
EVENT_LOOP_POLICY = uvloop
PEER_DISCOVERY = self
MAX_SESSIONS = 5000
INITIAL_CONCURRENT = 100
COST_SOFT_LIMIT = 0
COST_HARD_LIMIT = 0
SSL_CERTFILE = /home/electrum/server.crt
SSL_KEYFILE = /home/electrum/server.key
MAX_SEND = 2000000
same as before the leak
i install with pip3 install .[uvloop,rapidjson]
, so including rapidjson
switched back to latest commit and the memory usage rises faster then with https://github.com/spesmilo/electrumx/commit/0256e97715f16730be7293d5f8d893bb476315dc
last hill is latest commit, the 3 before that are https://github.com/spesmilo/electrumx/commit/0256e97715f16730be7293d5f8d893bb476315dc, all the other ones are also latest commit
switching now back to ee2c78e590a2794e1353acf7080d6eee2e71f60b (before the aiorpcx update) to see what happens
ee2c78e590a2794e1353acf7080d6eee2e71f60b does not have the memleak... aiorpcX-0.18.7 with same python 3.9 on same server with debian 11
https://github.com/spesmilo/electrumx/commit/ee2c78e590a2794e1353acf7080d6eee2e71f60b does not have the memleak...
Same on Ubuntu server 20.04. Leak is gone.
I've found the leak, it's in aiorpcx. See https://github.com/kyuupichan/aiorpcX/issues/46
Please test with current master when you have time.
thanks for the fix i updated and restarted one server
saw this in log once, soon after start:
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Task exception was never retrieved
Apr 28 13:50:10 electrum3 electrumx_server[926731]: future: <Task finished name='Task-202' coro=<SessionBase._process_messages() done, defined at /usr/local/lib/python3.9/dist-packages/aiorpcx/session.py:217> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Traceback (most recent call last):
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 219, in _process_messages
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self._process_messages_loop(recv_message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 449, in _process_messages_loop
Apr 28 13:50:10 electrum3 electrumx_server[926731]: message = await recv_message()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/websocket.py", line 63, in recv_message
Apr 28 13:50:10 electrum3 electrumx_server[926731]: message = await self.websocket.recv()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 553, in recv
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.ensure_open()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 930, in ensure_open
Apr 28 13:50:10 electrum3 electrumx_server[926731]: raise self.connection_closed_exc()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Task exception was never retrieved
Apr 28 13:50:10 electrum3 electrumx_server[926731]: future: <Task finished name='Task-1166' coro=<RPCSession._throttled_request() done, defined at /usr/local/lib/python3.9/dist-packages/aiorpcx/session.py:472> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Traceback (most recent call last):
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 501, in _throttled_request
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self._send_message(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 153, in _send_message
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.transport.write(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/websocket.py", line 83, in write
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.websocket.send(framed_message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 620, in send
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.ensure_open()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 921, in ensure_open
Apr 28 13:50:10 electrum3 electrumx_server[926731]: raise self.connection_closed_exc()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Task exception was never retrieved
Apr 28 13:50:10 electrum3 electrumx_server[926731]: future: <Task finished name='Task-1168' coro=<RPCSession._throttled_request() done, defined at /usr/local/lib/python3.9/dist-packages/aiorpcx/session.py:472> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Traceback (most recent call last):
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 501, in _throttled_request
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self._send_message(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 153, in _send_message
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.transport.write(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/websocket.py", line 83, in write
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.websocket.send(framed_message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 620, in send
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.ensure_open()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 921, in ensure_open
Apr 28 13:50:10 electrum3 electrumx_server[926731]: raise self.connection_closed_exc()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Task exception was never retrieved
Apr 28 13:50:10 electrum3 electrumx_server[926731]: future: <Task finished name='Task-1171' coro=<RPCSession._throttled_request() done, defined at /usr/local/lib/python3.9/dist-packages/aiorpcx/session.py:472> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Traceback (most recent call last):
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 501, in _throttled_request
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self._send_message(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 153, in _send_message
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.transport.write(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/websocket.py", line 83, in write
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.websocket.send(framed_message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 620, in send
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.ensure_open()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 921, in ensure_open
Apr 28 13:50:10 electrum3 electrumx_server[926731]: raise self.connection_closed_exc()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Task exception was never retrieved
Apr 28 13:50:10 electrum3 electrumx_server[926731]: future: <Task finished name='Task-1172' coro=<RPCSession._throttled_request() done, defined at /usr/local/lib/python3.9/dist-packages/aiorpcx/session.py:472> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Traceback (most recent call last):
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 501, in _throttled_request
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self._send_message(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 153, in _send_message
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.transport.write(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/aiorpcx/websocket.py", line 83, in write
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.websocket.send(framed_message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 620, in send
Apr 28 13:50:10 electrum3 electrumx_server[926731]: await self.ensure_open()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 921, in ensure_open
Apr 28 13:50:10 electrum3 electrumx_server[926731]: raise self.connection_closed_exc()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
but server continues to run in couple hours we know if the memleak is gone
the memleak is fixed, thanks
this is the RAM usage for my electrum server (i compact DB/restart once a week):
since couple of weeks it uses much more memory then it did before