Memory usage - Githubissues

saimn commented 6 years ago

When working with (relatively) big data cubes, taking care of memory usage is important. Currently it seems clear that I cannot use cubeviz with a MUSE cube on my laptop (with 8Gb of RAM), and even on a more powerful laptop (having 16Gb is pretty common now, at least for Macbooks) I'm not sure it will work. If I can make a few suggestions:

One easy way to save memory would be to have an option to avoid loading the variance (and mask if there is one?) extension. You can often do without it , when you want to have a quick look at the cube etc., or if you just don't have enough memory.
I guess the initial image and spectrum are created by collapsing the cube on the spatial/spectral dimensions, which means that all the cube is loaded in memory right at the beginning. There are several ways to avoid this, I don't know for other instruments but for MUSE it's pretty common to have image extensions (white-light, wide band images) in the FITS file. These could be detected on loaded instead of computing it (Is there a way to do this in the MUSE loader if it is too instrument specific ?). Not sure how the initial spectrum is computed (it looks like the entire cube is collapsed), maybe there is also a less memory expensive way to proceed here, taking an aperture at the center of the cube ? (this could maybe also be instrument specific?)
The data is converted to float64 by default, e.g. data.add_component(component=hdu.data.astype(np.float), label=component_name). We usually store MUSE cubes as float32 to save some disk space, it would be nice to have an option to avoid the float64 conversion (and btw save memory, and make a better use of memmap). Also it seems that a reference is kept to the opened HDUList, which probably means that the data is stored twice in memory (one float64 array, and one float32 in the FITS HDU) ? (I have not checked this yet).

astrofrog commented 6 years ago

Glue should normally make use of memory mapping to avoid using too much memory - I think the issue here is indeed the conversion to float64 in cubeviz. We should avoid this and keep the memory-mapped arrays.

Even if we fix the memory mapping there may be a temporary memory increase when collapsing the cube to the spectrum, but I recently added some functionality in glue to make this more efficient and can check whether we can use it here.

saimn commented 6 years ago

Yes, the float conversion is the main issue here. I realized after writing this that the displayed image is a slice at the central wavelength, so the only reason why the cube is fully loaded is to compute the mean spectrum. For wide cubes this spectrum does not make sense, so it would be great to have a way to choose another initial spectrum (an aperture at the center of the cube could make sense for many cubes as often the main object is at the center). Other than that, I found the memory usage to be quite stable while playing with several features, around ~9Gb for a 3Gb cube (stored as float32), so this is actually pretty good!

spacetelescope / cubeviz

Memory usage #440