Closed dhimmel closed 6 years ago
In https://github.com/greenelab/hetmech/pull/142/commits/484f36caa67909d21f6db41c9d68479181e25e06 from https://github.com/greenelab/hetmech/pull/142, I stopped the computation mid memory leak, with Python consuming 48.9 GB of memory. Comparing tracemalloc
snapshots using the lineno
statistic, we get the following top two statistics (the rest are small :fish:)
/home/dhimmel/anaconda3/envs/hetmech/lib/python3.6/site-packages/pandas/core/indexes/multi.py:2688: size=28.7 GiB (+28.7 GiB), count=1994546 (+1994546), average=15.1 KiB
/home/dhimmel/anaconda3/envs/hetmech/lib/python3.6/site-packages/pandas/core/indexes/multi.py:2683: size=28.7 GiB (+28.7 GiB), count=1992955 (+1992955), average=15.1 KiB
Hence, it looks like many pandas multi-index instances are getting created (and presumably not destroyed), causing the leak. Here is where the line numbers point:
# pandas/core/indexes/multi.py:2683
slabels = slabels[slabels != -1]
# pandas/core/indexes/multi.py:2688
olabels = olabels[olabels != -1]
Still not fully sure how to interpret this besides that perhaps these are the lines where the leaking memory is allocated?
Update: opened https://github.com/pandas-dev/pandas/issues/23047
In https://github.com/greenelab/hetmech/pull/140 / https://github.com/greenelab/hetmech/pull/140/commits/b882476bc4f1807033843e22392dacfdeaaec598, we specified computing degree-grouped permutation stats for 200 permutated hetnets. However, the computation died on the 99th iteration without any error message. Hence, I suspected the process was killed due to excessive memory consumption. I reran the bulk notebook while supervising memory usage and within a day or two the process was consuming 50 GB of RAM and counting.
Our cache sizes for path count matrices are set at 16GB, so max memory usage shouldn't exceed 20GB (4 GB is a generous estimate for the other objects that must be stored). Hence, it seems that the garbage collection is not working as expected, or that we are not properly clearing references to discarded files.
I stopped the notebook with the growing leak, with it's objects still in memory and then ran:
Running these commands caused memory consumption to drop:
Still not sure what to make of this clue.