Possible memory leak during motion correction

vjlbym commented 8 years ago

Hi guys,

I have been using a code similar to below for motion correction.

sequences = [sima.Sequence.create('HDF5', 'filename.hdf5', 'tyx')]
sima.ImagingDataset(sequences, 'filename.sima')
mc_approach = sima.motion.HiddenMarkov2D(granularity='row',
                                         verbose=True,
                                         max_displacement=[50, 100])
init_data = sima.ImagingDataset.load('filename.sima')
dataset = mc_approach.correct(init_data, 'filename_mc.sima')

I've also done the same thing by directly calling the sequences variable in motion correction, rather than the imaging dataset. From the source code, it looked like sequences is an iterator over each frame. So I thought the entire dataset needn't be loaded into memory all at once.

This is a problem as I've been running pretty large files on EC2 (~15-20GB) and the RAM is limiting the number of files I can run in parallel on a single instance. I haven't tried running these on my local machine on which I wouldn't even be able to load them into memory.

Is there something simple that I am overlooking in the code?

Thanks! Vijay

jzaremba commented 8 years ago

Hmm, motion correcting should not hold the entire file in memory. I'll look into a couple of things here, what version of sima and h5py are you using?

vjlbym commented 8 years ago

Thanks Jeff, I'm running 1.3.0 for SIMA and 2.6.0 for h5py. Whenever I run files totaling more than 55GB or so on an EC2 instance of 60GB RAM, I get Out of Memory errors. Running top shows that individual python processes per file end up using roughly the size of the file in memory. The only other command at the end of the above set that I run is export_averages once the motion correction is done. But it's in the midst of the above mc_approach.correct that I seem to be getting Out of Memory errors from Python.

vjlbym commented 8 years ago

I am trying to debug this in a session I am running right now. Based on top output, the memory use seems to ramp up quickly around the time I see "Estimating displacements for cycle 0" from the verbose output of motion correction. It's still not at the size of the file. So I am now pretty sure the file is not fully getting loaded prior to motion correction and the title of this issue is incorrect. But the memory usage seems to ramping up by the minute in my current session (e.g. for a 7.8 gb file, 4GB of resident memory is being used according to top and for another 4GB file, 2GB of resident memory is already in use).

From looking at prior sessions close to the end of motion correction, I have seen approximately the size of the file in memory, which is why I assumed they were getting loaded into memory. I have 32 cores in the instance I am running and have 60GB. So I can technically run 32 parallel processes with no loss of performance but currently, the max number I can reliably run is limited by the size of the files. So with ~8-10GB HDF5 files, I can run 5 without any problems but with 6, I sometimes get out of memory errors before motion correction is completed.

So overall, now I am actually worried there might be a memory leak going on considering the slow ramp up in memory usage.

Thanks again for your help!

vjlbym commented 8 years ago

One final update. The program I ran is close to completion. I ran two parallel processes as mentioned above. One of them is already done executing and used the size of the file in RAM right before completion. The second python process motion correcting the ~8GB file is still running and is now occupying 8 GB of RAM. This has all the signs of a memory leak: constant build up over time and sudden clearing once the process exits. I tried reading the motion correction source code but it seems to be fairly large for me to devote sufficient time. So any help in debugging this would be tremendously helpful.

Are the local computers you guys use for motion correcting large files fitted with huge amounts of RAM? If not, this would be pretty obvious. So I am wondering if I am doing something incorrectly to cause this leak. I doubt it but am surprised this isn't a more common problem.

pkaifosh commented 8 years ago

HDF5 files can be chunked in different ways, which can affect how much data you must load at a single time (https://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/). How are you HDF5 files chunked?

vjlbym commented 8 years ago

Thanks @pkaifosh I am chunking them into frames. So one chunk is one frame. I only have one plane and channel. So, for 10000 frames, I have 10000 chunks. Is this the problem? If so, how do you think I can improve it? I thought the only cost to having small chunks like this is reduced performance in I/O, rather than memory.

pkaifosh commented 8 years ago

That sounds like a reasonable way of chunking. As long as the you aren't doing something like chunking by row/column, then things should be fine. Any other clues on what might be causing the problem?

vjlbym commented 8 years ago

Yeah, that's what I thought as well.

I don't quite know what the issue is, but my guess would be that prior frames aren't cleared from the memory fully, i.e. that some reference to a previously loaded (or corrected) frame is held in memory even after its use. That would explain the slow ramp up of memory usage, reaching the full size of the file just before the end of motion correction. But I haven't really ever taken a look at the motion correction code to figure out where this might be. Do you think something like this might be a possibility? I could start looking over the code as well. I just wanted to see if something obvious came to you guys.

nbdanielson commented 8 years ago

@jzaremba and I made some progress on this today.

Memory is accumulating in the _beam_search function in hmm.py (here). In this function, we're looping over every row of every frame (e.g. for a 500 frame dataset of 512x512 frames, this would result in 500 frames * 512 rows = 256,000 iterations over the loop on Line 810). In each loop of the iteration, we are appending arrays to two lists (states and backpointer). The number of entires in the arrays we append each iteration is equal to the number of retained states (default = 50). Therefore, we obtain two lists containing approximately 256,000 arrays of 50 values each ~ 12.8 million values. Each of these values is stored as an int64. We believe this accounts for the memory that accumulates over the course of the motion correction. For a 251GB .h5 file (500 frames, 512x512 frames, single plane), running HMM2D with 50 retained states and no granularity, these two variables accounted for about 245 GB of memory consumption -- this is close to the size of the original dataset (as @vjlbym noted), but is somewhat coincidental and dependent on our choice of parameters.

Based on this, increasing your granularity or decreasing your num_retained_states will help with memory consumption, but there are certainly cases when this is not possible, so it seems we need to do something here.

Currently the data in these variables is being stored as int64...in the example dataset we ran, the maximum value in either list was on the order of 10^4, so perhaps storing these at int16 would be feasible. It would cut the memory consumption down by 1/4, but we don't want to overflow either.

@pkaifosh do you have any insights here? Are these variables bounded by anything? Alternatively, are there any obvious alternatives for implementing the beam search that would avoid the memory accumulation?

vjlbym commented 8 years ago

Thanks, Nathan and Jeff! I'm sure that's the reason for the memory accumulation.

I have a simple suggestion that I believe should fix it. Can you not write the two lists on disk into HDF5 files? You should be able to chunk it since you know the size of the final array right from the start, if I understand the code correctly. It seems there's not much real time manipulation of the two lists in that function, other than calling the last value a couple of times. The disk I/O performance will take a hit but the reduced consumption of RAM might compensate for it even in terms of processing speed. Would this work? Or did I completely misunderstand it?

Also, would you guys be able to help me understand those variables? My initial guess was that you are calculating different possible x and y translations and their posterior probabilities in the function with states representing the maximum a-posteriori translations or something; but if it can take values up to 10^4, that doesn't make sense.

vjlbym commented 8 years ago

Btw, congrats on the Neuron paper!

pkaifosh commented 8 years ago

These variables are for doing the backward pass of the beam search. Once you find the most probable termination state, then you go backward and find the most probable way of having got there, and then you keep going one step backward at a time.

There are a number of approaches for reducing the memory. 1) The suggestion of reducing to int16 would be fine, so long as there are fewer than 2^16 possible states. 2) The suggestion of saving to disk is also reasonable. 3) The probabilities will collapse fairly quickly, so you don't have to wait until the end to do the backward step. You could start the backward step from all the states currently being stored as possible current displacements. You could then see at what time-point in the past they converge to a single possibility, and then do the backward step from there. You can then get rid of all the stored states and backpointers up to that point and just keep the inferred trajectory.

losonczylab / sima

Possible memory leak during motion correction #216