spesmilo / electrumx

Alternative implementation of spesmilo/electrum-server
MIT License
432 stars 343 forks source link

memory leak in recent code #172

Closed cipig closed 2 years ago

cipig commented 2 years ago

this is the RAM usage for my electrum server (i compact DB/restart once a week): image since couple of weeks it uses much more memory then it did before

SomberNight commented 2 years ago

Any idea about specific commits? Also, to be clear, does it also happen with current HEAD (7e53936f34d2f1c979c64f91d15d6745fb8103c2)?

cipig commented 2 years ago

no idea, but must be one of the last 10 commits, after Jan 28, after https://github.com/spesmilo/electrumx/commit/3f12f288a442d4c3429de0a813c6355a0132633f i am using latest master branch, but not 100% sure that all servers were already restarted with this version, some may be missing the last 3 commits... do you think the memleak might have been introduced after Jan 28, but already fixed in the last 3 commits? i will restart everything now to be sure all servers are on latest master

SomberNight commented 2 years ago

well this commit https://github.com/spesmilo/electrumx/commit/c5d1e802e7a7e98db36fbad79954b3a46b9e03f3 might be leaking memory, but then I would have thought that the next one (https://github.com/spesmilo/electrumx/commit/b10d03caf639c52a97322d92098a4243ac6091fb) fixed that

I doubt the last 3 commits are relevant :/

cipig commented 2 years ago

all servers are running at least versions that have both those commits in the meantime i am sure the memleak is still there... a server running latest master was restarted yesterday and the electrumx server already uses 12GB RAM (used to be 5)

TrickyRaccoons commented 2 years ago

I don't think it is a commit causing the memory leak. My server is impacted by the memory leak since the DDOS against it started. I think the memory leak is caused by the large amount of attempted connections by the botnet.

cipig commented 2 years ago

i run couple dozen different coins, some have a very low number of connections, but they are all affected peer discovery is also off on all coins, PEER_DISCOVERY = self i could try to go back to an older version, before the python 3.8 change... if the problem is not solved i will have to, since this version will crash my server sooner or later with OOM (automatic restart of all coins every week will not be enough to compensate for the memleak) drawback is that i then have the random crashes back (which were solved in the last commits), so i will wait couple days more before reverting to older version

cipig commented 2 years ago

the situation is the same image coins that were using 5GB RAM before are now using 30GB, till restarted once a week will downgrade to https://github.com/spesmilo/electrumx/commit/0256e97715f16730be7293d5f8d893bb476315dc and see what happens

shsmith commented 2 years ago

My servers started showing this same leakage when I updated from ubuntu 18.04 (python 3.7) to 20.04 (python 3.8). I suspect Python 3.8 is somehow missing a fix from 3.7.

SomberNight commented 2 years ago

@shsmith do you mean that with same electrumx commit some python versions have a memleak but others not?

shsmith commented 2 years ago

Yes, that is what I think happened. After the OS+python upgrade the leak started. I did the upgrade on two servers on 2/19/2022. I may have done a get pull on the same day, but I think the leak was triggered by the python version change rather than any recent git commit.

cipig commented 2 years ago

image the last 2 "hills" are from https://github.com/spesmilo/electrumx/commit/0256e97715f16730be7293d5f8d893bb476315dc looks like the leak is not that big... unfortunately this version crashes on websockets requests and need to be restarted, so the chart is not for an entire week.

cipig commented 2 years ago

bigger picture image the last 3 hills are with https://github.com/spesmilo/electrumx/commit/0256e97715f16730be7293d5f8d893bb476315dc guess it's mainly related to OS+python 3.9... it all started with that here too... funny is that i did the OS update to get the newer python version to be able to run latest electrumx (previous debian had python 3.7) could it be related to some python modules? i would suspect aiorpcx, that was the "big" update... i have 0.22.1, the latest one this is my config for this particular coin

COIN = Komodo
DB_DIRECTORY = /electrumdb/KMD
DAEMON_URL = http://xxxxx:xxxxxx@127.0.0.1:7771/
SERVICES = tcp://:10001,rpc://:8001,ssl://:20001,wss://:30001
EVENT_LOOP_POLICY = uvloop
PEER_DISCOVERY = self
MAX_SESSIONS = 5000
INITIAL_CONCURRENT = 100
COST_SOFT_LIMIT = 0
COST_HARD_LIMIT = 0
SSL_CERTFILE = /home/electrum/server.crt
SSL_KEYFILE = /home/electrum/server.key
MAX_SEND = 2000000

same as before the leak

i install with pip3 install .[uvloop,rapidjson], so including rapidjson

cipig commented 2 years ago

switched back to latest commit and the memory usage rises faster then with https://github.com/spesmilo/electrumx/commit/0256e97715f16730be7293d5f8d893bb476315dc image last hill is latest commit, the 3 before that are https://github.com/spesmilo/electrumx/commit/0256e97715f16730be7293d5f8d893bb476315dc, all the other ones are also latest commit switching now back to ee2c78e590a2794e1353acf7080d6eee2e71f60b (before the aiorpcx update) to see what happens

cipig commented 2 years ago

image ee2c78e590a2794e1353acf7080d6eee2e71f60b does not have the memleak... aiorpcX-0.18.7 with same python 3.9 on same server with debian 11

shsmith commented 2 years ago

https://github.com/spesmilo/electrumx/commit/ee2c78e590a2794e1353acf7080d6eee2e71f60b does not have the memleak...

Same on Ubuntu server 20.04. Leak is gone.

electrumx_mem-day (1)

SomberNight commented 2 years ago

I've found the leak, it's in aiorpcx. See https://github.com/kyuupichan/aiorpcX/issues/46

SomberNight commented 2 years ago

Please test with current master when you have time.

cipig commented 2 years ago

thanks for the fix i updated and restarted one server

saw this in log once, soon after start:

Apr 28 13:50:10 electrum3 electrumx_server[926731]: Task exception was never retrieved
Apr 28 13:50:10 electrum3 electrumx_server[926731]: future: <Task finished name='Task-202' coro=<SessionBase._process_messages() done, defined at /usr/local/lib/python3.9/dist-packages/aiorpcx/session.py:217> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Traceback (most recent call last):
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 219, in _process_messages
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self._process_messages_loop(recv_message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 449, in _process_messages_loop
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     message = await recv_message()
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/websocket.py", line 63, in recv_message
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     message = await self.websocket.recv()
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 553, in recv
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.ensure_open()
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 930, in ensure_open
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     raise self.connection_closed_exc()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Task exception was never retrieved
Apr 28 13:50:10 electrum3 electrumx_server[926731]: future: <Task finished name='Task-1166' coro=<RPCSession._throttled_request() done, defined at /usr/local/lib/python3.9/dist-packages/aiorpcx/session.py:472> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Traceback (most recent call last):
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 501, in _throttled_request
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self._send_message(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 153, in _send_message
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.transport.write(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/websocket.py", line 83, in write
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.websocket.send(framed_message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 620, in send
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.ensure_open()
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 921, in ensure_open
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     raise self.connection_closed_exc()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Task exception was never retrieved
Apr 28 13:50:10 electrum3 electrumx_server[926731]: future: <Task finished name='Task-1168' coro=<RPCSession._throttled_request() done, defined at /usr/local/lib/python3.9/dist-packages/aiorpcx/session.py:472> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Traceback (most recent call last):
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 501, in _throttled_request
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self._send_message(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 153, in _send_message
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.transport.write(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/websocket.py", line 83, in write
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.websocket.send(framed_message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 620, in send
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.ensure_open()
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 921, in ensure_open
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     raise self.connection_closed_exc()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Task exception was never retrieved
Apr 28 13:50:10 electrum3 electrumx_server[926731]: future: <Task finished name='Task-1171' coro=<RPCSession._throttled_request() done, defined at /usr/local/lib/python3.9/dist-packages/aiorpcx/session.py:472> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Traceback (most recent call last):
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 501, in _throttled_request
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self._send_message(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 153, in _send_message
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.transport.write(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/websocket.py", line 83, in write
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.websocket.send(framed_message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 620, in send
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.ensure_open()
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 921, in ensure_open
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     raise self.connection_closed_exc()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Task exception was never retrieved
Apr 28 13:50:10 electrum3 electrumx_server[926731]: future: <Task finished name='Task-1172' coro=<RPCSession._throttled_request() done, defined at /usr/local/lib/python3.9/dist-packages/aiorpcx/session.py:472> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Apr 28 13:50:10 electrum3 electrumx_server[926731]: Traceback (most recent call last):
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 501, in _throttled_request
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self._send_message(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/session.py", line 153, in _send_message
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.transport.write(message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/aiorpcx/websocket.py", line 83, in write
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.websocket.send(framed_message)
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 620, in send
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     await self.ensure_open()
Apr 28 13:50:10 electrum3 electrumx_server[926731]:   File "/usr/local/lib/python3.9/dist-packages/websockets/legacy/protocol.py", line 921, in ensure_open
Apr 28 13:50:10 electrum3 electrumx_server[926731]:     raise self.connection_closed_exc()
Apr 28 13:50:10 electrum3 electrumx_server[926731]: websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)

but server continues to run in couple hours we know if the memleak is gone

cipig commented 2 years ago

the memleak is fixed, thanks