radio-astro-tools / spectral-cube

Library for reading and analyzing astrophysical spectral data cubes
http://spectral-cube.rtfd.org
BSD 3-Clause "New" or "Revised" License
95 stars 62 forks source link

Workflow question: Arithmetic, etc, w/o loading all into memory #774

Open keflavich opened 2 years ago

keflavich commented 2 years ago

When doing huge cube arithmetic, which is needed in many cases, what workflow do we suggest to avoid OOM errors like this?

Traceback (most recent call last):
  File "<ipython-input-9-bc7d74799a72>", line 3, in <module>
    beam_correct_cube(fn.replace(".image",""))
  File "/orange/adamginsburg/ALMA_IMF/reduction/reduction/cube_finalizing.py", line 41, in beam_correct_cube
    merged = rescale(convmod, epsdict['epsilon'],
  File "/orange/adamginsburg/ALMA_IMF/reduction/reduction/beam_volume_tools.py", line 153, in rescale
    restor = conv_model.unitless + residual*epsilon[:,None,None]
  File "/blue/adamginsburg/adamginsburg/repos/spectral-cube/spectral_cube/spectral_cube.py", line 2245, in __mul__
    return self._apply_everywhere(operator.mul, value)
  File "/blue/adamginsburg/adamginsburg/repos/spectral-cube/spectral_cube/utils.py", line 49, in wrapper
    return function(self, *args, **kwargs)
  File "/blue/adamginsburg/adamginsburg/repos/spectral-cube/spectral_cube/spectral_cube.py", line 925, in _apply_everywhere
    data = function(u.Quantity(self._get_filled_data(fill=self._fill_value),
  File "/orange/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/astropy/units/quantity.py", line 960, in __mul__
    return super().__mul__(other)
  File "/orange/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/astropy/units/quantity.py", line 486, in __array_ufunc__
    result = super().__array_ufunc__(function, method, *arrays, **kwargs)
MemoryError: Unable to allocate 90.5 GiB for an array with shape (1920, 2378, 2661) and data type float64

The example code is https://github.com/ALMA-IMF/reduction/blob/1fecfdceb94ab5a673cc4ce7bce9ee82224033f9/reduction/cube_finalizing.py

In brief, it's doing something like:

cube1 = SpectralCube.read(...)
cube2 = SpectralCube.read(...)
cube3 = cube1.convolve_to(beam) + cube2
cube4 = SpectralCube.read(...)
cube5 = cube3 * cube4
cube4.write(...)
cube5.write(...)

I think dask should be able to handle this, and should be automatically handling this. I haven't figured out yet where we go awry...

keflavich commented 2 years ago

This points at a bug, actually - somewhere internally we're converting from a dask array to a memory-allocated numpy array.

keflavich commented 2 years ago

the problem appears to be the use of u.Quantity. It reads the data at init time.

dhomeier commented 2 years ago

If I read the u.Quantity docstring correctly, that's inevitable if the dask __array__() method returns a copy.

keflavich commented 2 years ago

see https://github.com/astropy/astropy/issues/8227#issuecomment-951431146 also - yes, it is inevitable with the current construction. We need a workaround that avoids creating Quantity objects until the last moment

keflavich commented 2 years ago

and https://github.com/radio-astro-tools/spectral-cube/pull/775 was my start at trying to fix this, but it doesn't.