seung-lab / connected-components-3d

Connected components on discrete and continuous multilabel 3D & 2D images. Handles 26, 18, and 6 connected variants; periodic boundaries (4, 8, & 6)
GNU Lesser General Public License v3.0
356 stars 42 forks source link

Massive memory Leak #102

Closed modaresimr closed 1 year ago

modaresimr commented 1 year ago

Unfortunately, this fantastic package has a massive memory leak

i suggest wrapping it in the following command to avoid memory leaks.


import cc3d

import concurrent.futures

def connected_components(bianry_array, return_N=True):
    with concurrent.futures.ProcessPoolExecutor() as executor:
        f = executor.submit(cc3d.connected_components, (bianry_array), return_N=return_N)
        ret = f.result()

    return ret
william-silversmith commented 1 year ago

Hi modaresimr,

Can you show me the code that produces the memory leak? What OS, Python version, cc3d version are you running?

Thanks for reporting, Will

modaresimr commented 1 year ago

I am using around 1000 time 3d arrays of 512x512x689 int

After 3 use of cc3d, the memory will be increased to 20 GB!!!!!! It won't release memory! After a deep debugging by changing that to the code that I mentioned it is fixed.

I will try to take out a part of my code to share soon

william-silversmith commented 1 year ago

That would be very helpful (along with details such as OS, Python version, and cc3d version). cc3d should be using < 2.2 GB assuming one image in memory.

One thing you can try and see if it helps: cast your array to an unsigned integer type. It shouldn't make a difference, but cc3d was manually tested mainly on unsigned arrays (though integer arrays are included in automated testing).

modaresimr commented 1 year ago

Thank you for your support, I found out that this problem is due to an issue in python memory management when you create a lot of big arrays, and then remove them, the memory will be fragmented, and it can not allocate new arrays. However, if we create a new process, and run it in a separate process, we will not face any issues.

More info on this issue on: https://stackoverflow.com/a/9617718/834117

william-silversmith commented 1 year ago

Thanks for looking into this more! If there's a way for you to reuse the arrays (e.g. if they're all the same size) that might help a lot. I'm going to close this issue as it's not directly related to cc3d but is a general python issue. Please reopen if you think you need more help!

modaresimr commented 1 year ago

Thanks for looking into this more! If there's a way for you to reuse the arrays (e.g. if they're all the same size) that might help a lot. I'm going to close this issue as it's not directly related to cc3d but is a general python issue. Please reopen if you think you need more help!

Thanks for your suggestion, however, numpy with all the operations create a new array and we can not reuse them :(

william-silversmith commented 1 year ago

numpy with all the operations create a new array and we can not reuse them :(

It depends on which operations you're doing. For example the += operator will avoid creating a copy. You can also check the documentation for several numpy operators which include an out= parameter. For example, np.multiply(a,b, out=a).