yaq-project / yaqd-attune

MIT License
0 stars 2 forks source link

[INCIDENT] all memory somehow gets used #37

Closed ddkohler closed 11 months ago

ddkohler commented 1 year ago

Every two weeks or so, our lab computers will have ~100% memory usage and become extremely lagged and buggy (applications crash/cannot open). I have observed this on the ps and fs tables, and I have heard that something similar happens on the Waldo system.

This happens without most services running (docker, most daemons), but attune and attune-delay will be running. Task manager shows a single python processes taking up a large chunk of memory (~0.5-4 GB). The memory used in all task manager is well short of the total computer memory usage (~30 GB).

I am attributing this to attune, but I might well be off. Just documenting the incident for now. I can imagine this being a yaqd-control/nssm issue as well. I will try running attune daemons in the foreground to have more information.

In all cases, the issue is resolved by restarting.

ksunden commented 1 year ago

By restarting what?

ddkohler commented 1 year ago

Sorry, crucial detail. The problem is fixed by restarting the computer.

When the daemons are closed, I can verify the python process closes in Task Manager, but the computer memory usage remains mostly unchanged and the computer remains lagged. Again, the reported memory usage greatly exceeds the sum of contributions from individual processes.

ksunden commented 1 year ago

I am very skeptical about this being attributable to yaqd-attune, I'd be more inclined to think that perhaps the docker stuff is eating memory, and would be interested to know if just restarting that helps? (That may also slightly hide its memory usage from windows, at least from individual user processes, but may show up in the total)

That is my gut instinct/first reaction, but certainly not proven.

untzag commented 12 months ago

Hi all was this fixed by https://github.com/yaq-project/yaq-python/pull/77

ddkohler commented 11 months ago

@untzag I am not certain yet, but there is evidence that is the case. The fs computer just had a repeat incident and the high memory usage was relieved by restarting the daemons. This behavior is consistent with a daemon memory leak. We have updated yaqd-core and will be waiting to see if the problem repeats itself over the next week or two.

ddkohler commented 11 months ago

I am not really observing memory issues anymore, so I am going to call this issue resolved by yaq-project/yaq-python#77 🥳