Open sappjw opened 1 month ago
That does look like a well-formed issue, thanks @sappjw .
I had a look through the code for 10 minutes as I'm not familiar with it. It looks like we maintain a REF_COUNTS
dict which increments, and I think is supposed to decrement on the file closing.
To the extent you want to explore more — it would be interesting to see whether that is incrementing-and-not-decrementing as you open the file on each loop. And then why it's not decrementing — does the file need to have .close
called rather than just going out of scope...
Thanks. Inserting a ds.close()
before the del ds
doesn't change the results (I did have to update the code above slightly to read file_obj.getvalue()
instead of file_obj.getbuffer()
).
OK. Would be interesting to see REF_COUNTS
/ feel free to explore more on what might be holding the reference...
If I put a del self._manager
in the close()
method of memtest_DataStore
, the memory growth goes away. Do you think this is a good way to handle the issue? I tried to model this off of the scipy backend and it doesn't free _manager
.
Very possibly, but I'm unfortunately a long way from the expert on this code, hence my rather basic debugging so far.
Others will hopefully know more. If that suggestion helps and you can put a small PR + test together, that will very likely get some good traction.
What happened?
I wrote a custom backend. I'm using it to open a file, operate on the data, remove most of it from the Dataset using
.isel
, open the next, concatenate, and repeat. I noticed the memory used by the system grew significantly over time even though the size of theDataset
was approximately the same. I was able to reproduce the problem without most of this complexity.I repeatedly created a dummy
Dataset
with 25Variable
s and observed the number of objects with objgraph after each object creation. I seeVariable
instances continually increasing, even though I havedel
'd theDataset
after creating it. I think this suggests that something inxarray
is not releasing theDataset
.I picked a random
Variable
that was not released and printed the reference chain graph.What did you expect to happen?
I expected the memory used for the Dataset to be released and garbage-collected. I expected the memory in use to plateau instead of grow.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
This crashes the Binder notebook instance since it uses so much memory.
Environment