Closed imbuedhope closed 9 months ago
Thank you @imbuedHope for the code to reproduce this issue. The code looks correct for gathering the on-instrument statistics.
I modified this code to include gc.collect()
just before time.sleep(0.1)
just to make sure that the garbage collector was behaving.
I first ran this code on Windows with a single Joulescope attached. I found that Win 10x64 stabilized at 48,828 kB RAM before dropping and stabilizing at 19,960 kB.
However, the execution under Ubuntu 64 continued to grow. I used your approach to print the date & size, but using:
date && ps -p 151963 -o %cpu,size,cmd
Here are the results:
Time | Memory (kB) |
---|---|
9:10:45 | 170340 |
9:11:58 | 173828 |
9:13:23 | 177800 |
9:16:18 | 185812 |
9:17:41 | 189852 |
9:20:44 | 198176 |
9:24:22 | 208412 |
9:30:48 | 226748 |
This could be exponential, but we'll need more data to be sure. Let's use mprof.
pip3 install -U memory_profiler
And then:
mprof run isolate.py
To display the results:
mprof plot --slope mprofile_{your_timestamp}.dat
Which gives:
That's nice a linear 44 kB/s leak, on average. In 24 hours, that amounts to 3.8 GB.
So, I confirm that the statistics data collection has a problem on Linux. I will continue to investigate.
I did more testing.
Here is mprof from Win10 x64
And here is mprof from mac 11 x64
I further isolated the memory leak. It is not occurring in statistics(), but rather scan. Here is an updated example:
#!/usr/bin/env python3
import time
import joulescope
import gc
if __name__ == "__main__":
scopes = []
try:
while True:
scopes, added, removed = joulescope.scan_for_changes(
name='Joulescope',
devices=scopes,
config='auto',
)
gc.collect()
time.sleep(0.1)
When run on Ubuntu x64, this mprof captures:
So same python code means the leak is likely not in Python. Same libusb interface code using ctypes on macOS and Linux, so likely not in the Joulescope driver's usage of libusb. Therefore, the most likely location for this issue is within the Ubuntu implementation of libusb. Ugh.
In the meantime, the easiest workaround is to scan less often, especially since nothing should be changing. If once a minute is sufficient, you can reduce the leaking memory by a factor of 600.
A possible solution is to only scan when required. Check out joulescope.usb.DeviceNotify and the libusb backend implementation.
The joulescope_driver backend (pyjoulescope v1 backend) uses hotplug events only and does not respond to scan. Even if there is still a problem with the Ubuntu libusb implementation, the leak should be much, much less. Also, reducing the frequency of scan
was an effective enough workaout.
I've been hunting for a memory leak that came to my attention after running my code for over 24 hours and got stuck after isolating it to the joulecope module. I can consistently reproduce the behavior. I've been testing with 9 joulescopes (plugged into a Raspberry Pi 4 with 4GB of RAM); it may be a lot less apparent with fewer devices.
The code that I managed to reproduce the behavior with is as follows.
After starting the process I tracked %mem usage over time at a few random intervals with
ps
with the following results.In the code I was originally trying to debug the last output from the script was as follows; we had 10 joulescopes connected to it when this happened. (This output is from
journalctl
since the script was launched withsystemd
)I'm not entirely sure how to isolate the leak further or what the underlying cause is. Any help resolving this would be very appreciated.