yaq-project / yaq-python

Repository for yaq core python packages.
https://yaq.fyi
GNU Lesser General Public License v3.0
5 stars 5 forks source link

`isDaemon`: memory usage from client connections persists after connection is lost #75

Closed ddkohler closed 9 months ago

ddkohler commented 10 months ago

I am observing this behavior on yaqd-lightcon-topas4-motor, but I believe this behavior should be observed in all daemons so I am posting it here.

When I make a client connection to a lightcon-topas4-motor (c=yaqc.Client(...)), I can see the memory usage of the daemon go up (~100 kB). When I close the daemon, the memory remains in use by the daemon. Each connection increases the memory usage by ~100 kB.

This may not seem like a big issue, but it can snowball quickly. e.g. when yaqd-attune is a background process and fails to initialize its dependents (this can happen when the lightcon server is down) it will retry the connection. It reconnects to every lightcon daemon every ~2 seconds, so after hours the connections take up GBs of memory for the daemon.

ksunden commented 10 months ago

My best guess from a quick look at the code is that we never actually kill off this task:

https://github.com/yaq-project/yaq-python/blob/2735e4e47c33d1b0094ad992a0c36038833fb99c/yaqd-core/yaqd_core/_protocol.py#L30-L31

Which is preventing things from being garbage collected, etc.

I think that it may be as simple as doing self.task = ...; self._daemon._tasks.append(self.task)

And then calling self.task.cancel() in the connection_lost method. (and ejecting it from the daemon's list of tasks... in fact, maybe just never adding it the first place, if it is held onto and cancelled separately... not sure... the daemons task list is held so that tasks get cancelled when shutdown is called.)

ddkohler commented 9 months ago

I can confirm this behavior is reproduced by running yaqd-fakes hardware. When client connections are made and closed, I can see clear monotonic usage of daemon memory (~50kB/connection for is_sensor) as number of connections increases--negligible memory change for decreasing connections.

@ksunden adding in your task cancellation to Protocol:

    def connection_lost(self, exc):
        peername = self.transport.get_extra_info("peername")
        self.logger.info(f"Connection lost from {peername} to {self._daemon.name}")
        self._daemon._connection_lost(peername)
        asyncio.get_event_loop().create_task(self.cancel_and_confirm())

    async def cancel_and_confirm(self):
        self.task.cancel()
        while not self.task.cancelled():
            await asyncio.sleep(0.1)
        self.logger.info("cancelled")

I also removed the task from the daemons list like you suggested.

Tasks do get cancelled, but I don't see significant changes in memory usage behavior.

ddkohler commented 9 months ago

My interpretation of the code modification was incorrect--I didn't account for how memory usage can fluctuate. If I execute a lot of connections (~1000s), I do see the memory usage contract eventually. Using yaqd-fakes, I have not seen daemon memory usage get past 10s of MBs after ~10000 connections. WIthout the task cancellation, the same connections bring the daemon memory to ~800 MB usage. This is greatly improved performance.

ddkohler commented 9 months ago

closed via #77