memory usage grows linearly

esheldon commented 2 years ago

The memory usage is growing linearly with calls to the deblender.

For example processing 100x100 images the memory usage starts quite minimal, about 100-200 megabytes and within a few thousand trials reached 2.6g.

On the batch system at BNL the jobs are eventually getting killed by the automatic job killer; I don't have reliable stats on what the usage was but I believe it must have exceeded 8 g for this to occur. This was again using 100x100 images.

esheldon commented 2 years ago

The test I first quoted above was running on current master but the jobs that got killed were using the version in the DM stack, so this seems to have been around a while.

esheldon commented 2 years ago

It is also worth noting that running other codes on the same images I see flat memory usage

esheldon commented 2 years ago

Code to reproduce

https://gist.github.com/esheldon/2f5e32155e37ddb201aa623b57e49328

esheldon commented 2 years ago

mem

fred3m commented 2 years ago

Interesting, I wonder if this is related to autograd. We know that it has huge memory issues and I wouldn't be surprised if there is a memory leak in there somewhere.

The other obvious culprits are the cached variables that occur during blending, especially the monotonicity operator. We should probably add a clear option to scarlet.cache.Cache to allow the user to clear the cache after each blend. @esheldon would you mind adding the line

scarlet.cache.Cache._cache = {}

to your code after you have finished with each blend? If the primary problem is a caching issue then this should fix that, otherwise it's likely autograd, in which case I hope to have a new module in the next few weeks that doesn't use autograd and gives a sizeable savings in performance and memory usage.

fred3m commented 2 years ago

It might even be advisable to add a clear_cache parameter that defaults to True in blend.fit to automatically clear the cache once the fit has finished to save the user the hassle. Making it an argument will allow advanced users to save the cache if they are doing multiple iterations on the same blend, but I imagine that most users won't want the cached variables hanging around after they've finished optimizing.

esheldon commented 2 years ago

clearing that cache results in much slower memory growth. I'm seeing a small growth rate ~10Mb / 5 thousand trials

fred3m commented 2 years ago

Ok, thanks for the update. I'll try to push an update tomorrow to make this the default behavior as stated in my comment above.

esheldon commented 2 years ago

Maybe a context manager.

with scarlet.cache():
    # run deblender in a way that benefits from a cache

pmelchior commented 2 years ago

Hmm. There aren't too many things we put in the cache. From Erin's script, I'm guessing that it's the monotonicity operator at the original source x/y position. We use that for initialization, and I thought I had removed it from the cache. If not, it'll accumulate in the cache with every new source until the process terminates.

@esheldon: can you check the keys of the Cache (it's a simple dict) in the case when you don't clear it to confirm, pls.

fred3m commented 2 years ago

I think the more likely culprit is the build_initialization_image that is stored in cache. This could/should probably be removed after the init, since it's the detection image that is the size of the entire blend.

pmelchior commented 2 years ago

Yeah, I just found the same. This things should go out of scope when the current observation is dropped, but it isn't: https://github.com/pmelchior/scarlet/blob/3e83d8e294813de35cb9cb57d33a35a3429b936a/scarlet/initialization.py#L248-L280

We used to have this data structure as member of the observation class, in which that problem wouldn't occur. But we changed it for multi-observation runs because this is the init image for all of the overlapping observations. In practice, this ability isn't even used because we don't have a multi-instrument init image.

I wonder if we shouldn't clean this up by creating something like a MultiObservation class to replace the flat list of Observations we carry around (and test for) in lots of places. Then we can attach extra functionality to this class instead of the global cache.

entaylor commented 2 years ago

I have just spent some time addressing this same issue, and found this issue as i was about to post my own.

I'm running scarlet version 1.0.1+gd1bda26, which was current when i installed about 3 weeks ago.

What i have done is to start with a set of images, weights, psfs, etc., and then to run through the steps of the deblending process; i.e. create the frame, the observation, initialization, blending, and then finally rendering the models. I do this N times for the same dataset, and track the resident memory after every iteration. The tests i have done is to stop after successive steps in the process; i.e. first only doing the frame step; then doing both the frame and observation and observation steps, etc.

Below is a plot that shows the results. When reading the legend, e.g. 'initialization' implies the process has been run up to and including the initialization step, but no further. Clearly the intialisation step is the worst culprit. That said, there is very small but real linear memory growth associated with the creation of the frames and observations.

scarlet_memory_test

The final curve shows the result of implementing the cache clearing step suggested by Fred3m here. I don't at all understand the shape of this curve. Suffice to say that clearing the cache resolves a lot but not all of the problems. It seems that (for my case at least) scarlet is carrying about ~0.5 Gb of unnecessary memory. I guess that this is kind of on the edge of caring about ?

pmelchior commented 2 years ago

Thanks for reporting. The offending cache call is in fact still present in the current code base. The reason is that we want to re-use the same initialization image for all sources in the same observation. We could tie this image the the Observation instance, but this doesn't work if we want to initialize from multiple of such instances, which arises for dithered observations or observations with different instruments. As I wrote above, the current code base doesn't have this functionality, so we should just make the change that actually fixes the current state.

As for the other 0.5 GB, scarlet needs a few somewhat large operators for images at various sizes. It builds them once it needs them for a source, that's why that line keeps growing. I wouldn't call that "unnecessary" memory.

pmelchior / scarlet

memory usage grows linearly #256