Closed wasimsandhu closed 2 weeks ago
Whoa interesting. If you change all local variables to global variables (remove underscores), do you still see the leak?
We previously had a leak with locals, which I thought I fixed but maybe I missed an edge case.
Another thing to try if you don't mind: Can you remove the last line (mo.ui.table(_df)
) and check if that gets rid of the leak?
Tried both your suggestions (first changing to global variables, then removing the last line). Unfortunately, the memory leak is still reproducible.
Will look into it. Weird request, but last try: Can you remove the line that reads data
into a dataframe?
I tried to reproduce it but couldn't — RAM fluctuated but it didn't increase unbounded. I would consider this a leak if when you keep on re-running the cell, you can make the memory usage go arbitrarily high (can you?).
The operating system doesn't always reclaim memory used by a Python process, even after Python's garbage collection runs. So after doing a large allocation and "freeing" it (variable going out of scope, explicit del
), the Python process's memory usage may still be higher than what it originally was. When I tried experimenting with similar code to yours just now, this was the kind of behavior I saw.
Yes, the memory usage continues to increase. After rerunning 30 times, the memory usage spikes up from 1 GB to 2.5 GB, and does not fluctuate back down. I even added an explicit delete and garbage collection, to no avail.
This has made my work somewhat tedious because this file is 1 of many which I am loading in the notebook, and I am constantly having to restart the kernel because of the memory usage.
As to your second note, I guess the best way to test this would be to reproduce similar behavior in a jupyter notebook? I don't have time to at the moment, but thanks for looking into this issue so promptly!
Actually you might be right. Maybe there is a ceiling to how high the memory can increase. When loading a much larger dataset (around 15 GB), I noticed that my memory usage spikes up pretty high after rerunning but then drops back down after a while.
Feel free to close if you can't reproduce.
Describe the bug
When I rerun a cell that opens a large file (in this case around 500 MB), I'm noticing that the memory usage is increasing. These screenshots are taken after each rerun.
Environment
Code to reproduce