yves-weissenberger / twoptb

Python toolbox for analysing data from two photon imaging experiments
0 stars 1 forks source link

convert_to_hdf5 hangs for long recordings #15

Open QNeuron opened 6 years ago

QNeuron commented 6 years ago

For recordings longer than 15 minutes in a row, convert_to_hdf5.py gets increasingly slower after some point, and finally hangs. I am not sure whether this is a problem with the function itself, or with my system (not enough RAM ?)

yves-weissenberger commented 6 years ago

Most of the data I use in the context of a single recording is substantially longer than 15mins, so I am relatively sure that this is a problem with your hardware. Have you checked your system monitor to see if your RAM+swap is full when it starts hanging? How large is your RAM?

Having said this, I think that this issue could be resolved by updating the initial data import pipeline using resizeable datasets and I can imagine this would be worth it.

I think resolving this simply amounts to changing the functions in the load_images.py file to add the data tiff by tiff to the hdf5 file.

yves-weissenberger commented 6 years ago

very happy for you to implement this?

QNeuron commented 6 years ago

I have 32 Gb of RAM which is definitely not huge. Maybe I should just put more.

But anyway, I think it can't hurt to write the tiff to the hdf5 as you load them.

I was also surprised by the size of these files. Do you dump both the unregistered and registered images in there ? It seems like a huge usage of memory, especially if you save the shift values used for the motion correction. Why not just save these, and correct the frame position when you load them (which should not happen very often, only when you extract the traces and when you look for a good video example ^^)

yves-weissenberger commented 6 years ago

Im surprised that it hangs with these datasets, I think @samupicard has similar RAM size and no problems?

Currently both raw and registered data is written to the hdf5 file. It should be simple to remove the raw data though, by deleting the appropriate datasets

This shouldn't be a problem if you want to do this. I personally like having both datasets in one place so that you can visualise the entire pipeline based on one file.

On a separate note shift values are saved in the processed folder.