Incredibly high RAM usage perhaps caused by large masks

tuckersylvia commented 8 years ago

Hi There, I have been trying to use trackpy for some time now and have run into an issue with ridiculously high RAM usage (like 10's of gigabytes) while trying to locate features/particles even in a single frame. The frames are generally 4k (4288x2848 pixels) but can be cropped to be smaller (~3000x2000 px). I have done preprocessing and segmentation to basically have a binary image with only the particles of interest shown as white. When doing some testing in a notebook it seems like the issue is related to the best guess of diameter argument passed to tp.locate. I guess large diameters because the particles of interest start at a point source with no size and then grow in size to ~400px diameter. There are usually only a few particles (0-5 per frame) and they do not all stay in frame for the full duration of the image sequence/

`diameter = 151 # store optimal diameter parameter in a variable %memit features = tp.locate(testframe, diameter, preprocess=False, percentile=0, engine='numba') # first frame

peak memory: 3328.61 MiB, increment: 3074.68 MiB`

As you can see I have preprocessing and thresholding turned off and try to use numba. when run with diameter=151 memit reports ~3.5 gigabytes of memory used by the execution of that line. If i change diameter to something like 251 pixels then a single call to tp.locate will use all 32 gigbytes of my system RAM and start creeping into swap.

Is there something about the algorithm I am fundamentaly missing? I assumed passing in binary images would be a cinch for this to track. I am impressed with the package and especially like the handling of trajectory data in a DataFrame for convenient manipulation, plotting, etc. I have attached a sample image for reference. ed_6_side_083_undistorted_cropped_masked

Please let me know if this is purely my user error or memory usage this high is considered normal for large particles in large images. Thanks for this wonderful project.

danielballan commented 8 years ago

Most users (including, I think, all of the core developers) use much smaller diameters (of order 10). I have incidentally seen weird things happen with big diameter sizes but I have never investigated. Offhand I'm not sure why the algorithm would have to scale quite that badly with the mask size; it seems like something we should be able to fix.

nkeim commented 8 years ago

Hi! Agreed—I don't think we've ever tested such large masks, and this is probably a bug. I've edited your title slightly to help everyone keep tabs on this issue.

Without knowing the details of your application: if you have already been able to reduce your image to binary blobs, you might want to look at the region-measuring algorithms in skimage. See this example. They are very efficient. If you were to use this approach, you would need to put all of the resulting centroid coordinates into a list or array, and then create a DataFrame with x, y, and frame columns (plus others if you wish). That is all that trackpy's linking code requires.

nkeim commented 8 years ago

Come to think of it, François Boulogne has already written a nice example of this method in action: http://soft-matter.github.io/trackpy/v0.3.2/tutorial/custom-feature-detection.html

This isn't to say this bug doesn't need fixing, but that you should consider this approach because it might work now, and work faster.

tuckersylvia commented 8 years ago

Thanks for the swift responses. I actually have a contour features class I wrote a while ago to try to do the matching myself before I found trackpy so maybe I can use that or the abovementioned method to feed into the trackpy linker. From above it seems like tp.link only needs x, y, frame# to work properly? I will be consulting api docs.

On another note would it be helpful for me to try to more fully profile a simple script that just loads a frame and attempts to locate features and see where the memory is blowing up?

caspervdw commented 8 years ago

I'd be curious to see a profiler result! I imagine that local_maxima (which is part of the locate) is coming up with a large amount of initial coordinates that are too close to each other due to the flattened feature shapes. This happens when your local maxima has pixels around with equal intensity values. These are filtered out only after the refine step.

tuckersylvia commented 8 years ago

Here is the output of spyders profiler running simple script on one binary image. I tried to save the output to file but it is binarized, but the screenshot says it all I think, most time spent in the nd_image.min_or_max_filter function.

trackpy-profile

hope that is useful for the curious

caspervdw commented 8 years ago

In that case, probably the current master is much faster for you as it has an improved local_maxima

caspervdw commented 7 years ago

Looking at this issue again, I think I understand why the RAM usage is so high. We have a code that eliminate duplicate maxima after the find_maxima. These occur in the case of flat peaks, with multiple pixels having the maximum intensity. This is OK for small features in greyscale images, but less so for binary images. These will have R^2 equally high feature pixels, providing order (R^2)! overlapping features that are filtered out right after find maxima.

We need a warning for binary images, especially when large diameter is used. I guess that the grey dilation technique that we use is not suitable for your images.

soft-matter / trackpy

Incredibly high RAM usage perhaps caused by large masks #405