yeatmanlab / pyAFQ

Automated Fiber Quantification ... in Python
http://yeatmanlab.github.io/pyAFQ/
BSD 2-Clause "Simplified" License
56 stars 34 forks source link

Memory demands of segmentation #1033

Open arokem opened 1 year ago

arokem commented 1 year ago

Several users have noticed that the segmentation is quite memory demanding. I suspect that it has to do with the initialization of multiple of the _SlsBeingRecognized objects on this line:

https://github.com/yeatmanlab/pyAFQ/blob/master/AFQ/segmentation.py#L569-L572.

A couple of thoughts about potential solutions:

  1. Pass a copy of the streamlines in, instead of the tg.streamlines variable. Here, I am worried about a potential memory leak associated with passing these streamlines by reference, and that it's possible that Python is keeping this reference alive across the repeated initializations.
  2. Spill to disk. In particular, maybe the variables defined on these lines of code: https://github.com/yeatmanlab/pyAFQ/blob/master/AFQ/segmentation.py#L52C1-L54 could use memory maps, instead of being fully held in memory in this way?
  3. Explicit garbage collection at the end of each bundle. This takes a little bit of time, but might help force a clearing of memory, since Python is not very pro-active doing its own garbage collection.

Happy to hear other thoughts/ideas for tackling this.

36000 commented 6 months ago

I haven't noticed this as a big issue anymore? I wonder if it is an issue with our parallelization setup, which is now not the default ( #1030 ) but could still use fixing if it is the problem