kyuupichan / aiorpcX

Generic async RPC implementation, including JSON-RPC
MIT License
27 stars 23 forks source link

memory leak in long-lived `TaskGroup` (and hence `SessionBase`) #46

Closed SomberNight closed 2 years ago

SomberNight commented 2 years ago

RPCSession(SessionBase) has a long-lived taskgroup: https://github.com/kyuupichan/aiorpcX/blob/e55950fed903b63e82b87c7b6ca93ab50e18ce08/aiorpcx/session.py#L127 Every time the session receives a message, a new task is spawned on the group: https://github.com/kyuupichan/aiorpcX/blob/e55950fed903b63e82b87c7b6ca93ab50e18ce08/aiorpcx/session.py#L470 This task is added to and stored in session._group.tasks as long as the taskgroup exists (so as long as the session exists): https://github.com/kyuupichan/aiorpcX/blob/e55950fed903b63e82b87c7b6ca93ab50e18ce08/aiorpcx/curio.py#L154

This tasks set is never cleared while the session is alive. It was added in 0.20.0 (https://github.com/kyuupichan/aiorpcX/commit/0266c1de5817134e264941e06ad00d0cf02a942e). Prior to that, only the pending tasks were stored (but not the completed ones) - which makes much more sense.

Hence, in practice, long-lived sessions cause a "memory leak" in ElectrumX. A server of mine typically has some sessions that are open for weeks, with millions of messages received for some -- or it had in the past at least as this server now runs OOM in a few days with recent aiorpcx.

kyuupichan commented 2 years ago

This is partly the reason for the async for... within the task group that I replied to your previous issue with. I will take a look, but not immediately. In ElectrumX head in my repo, I don't think this causes any issue. I have had instances run for 1 or 2 months+ with that code.

SomberNight commented 2 years ago

The async for is unrelated. Tasks are never removed from TaskGroup.tasks.

The leak in ElectrumX only happens if you have sessions that send a lot of messages and are never closed. The memory usage scales linearly with the number of messages ever received in still-open sessions.

kyuupichan commented 2 years ago

I believe async for removes them because their finalization status is effectively "consumed" by doing so.

SomberNight commented 2 years ago

The TaskGroup in SessionBase is using the async for approach, as all the code is encapsulated inside this library. Downstream ElectrumX usage does not affect that. https://github.com/kyuupichan/aiorpcX/blob/e55950fed903b63e82b87c7b6ca93ab50e18ce08/aiorpcx/session.py#L224-L230 That is, the group leaking the most memory is encapsulated inside aiorpcx, using your suggested syntax and usage.

I believe async for removes them because their finalization status is effectively "consumed" by doing so.

I don't understand what you mean. TaskGroup.tasks is not even iterated on by the async for ...: __aiter__ and __anext__ don't refer to it.

kyuupichan commented 2 years ago

Hmm, yes, I'll take a look, it might take a week or so though. Thanks.