Data manipulation - multiplication, addition, unit change, etc

keflavich commented 9 years ago

@ChrisBeaumont @astrofrog

Revisiting the discussion raised in https://github.com/radio-astro-tools/spectral-cube/pull/101: We don't have a nice way to directly manipulate the cube data without accessing it or creating a new cube, loading the whole thing into memory. Any thoughts on how best to do this?

There is one specific use case I'd like to target: Changing the flux units of a cube. Especially, changing from Kelvin -> Jy and vice-versa.

A way to do this now is:

cube_Jy = SpectralCube.read('file.fits')
cube_K = SpectralCube(cube_Jy.filled_data[:].to(u.K, u.brightness_temperature(beam, freq)), wcs=cube_Jy.wcs, mask=cube_Jy.mask)

which is tedious but not terrible; I suppose a wrapper .with_flux_unit() implementing that more cleanly would not be too terrible, but it would require reading the whole cube_Jy. I suppose you could also have a function that replaces the original cube's _get_filled_data() with one that translates to a new unit; that is a bit cleverer and doesn't go too far down the lazy-eval rabbit hole.

What about the more general case, though? Simple manipulation of data? apply_numpy_function can in principle do this, e.g. cube.apply_numpy_function(np.add, 5), but apply_numpy_function is currently meant to take only single-argument numpy ufuncs. We could add *args to allow any numpy of positional arguments, then if *args is non-empty, use how='cube'. This doesn't sound too bad to me, though it requires not being lazy.

I realize this project is probably way off both your radars right now, so please let me know if you won't be getting to this.

ChrisBeaumont commented 9 years ago

For the general case, I can't think of any approach better than lazy evaluation. Maybe every cube has a transform property, and pixel values are always passed through this function whenever they are accessed. I agree it would be a lot of work to make sure this behavior is consistently handled everywhere. You'd probably want a custom subclass of ndarray to make this as transparent as possible (then again any external code which uses the Numpy C API probably wouldn't work correctly)

Unlike Numpy, Blaze is designed with lazy evaluation and out of core computation in mind. It might be better positioned to implement stuff like this. But I don't know if it's a good idea to use it, since it's less familiar.

For the particular case of flux conversion, I think implementing a helper function which wraps existing functionality is a good idea. Even if it's inefficient, it establishes an API that lets people do what they want, and gives us freedom to substitute a better implementation down the road. On Sun, Feb 8, 2015 at 6:21 AM Adam Ginsburg notifications@github.com wrote:

@ChrisBeaumont https://github.com/ChrisBeaumont @astrofrog https://github.com/astrofrog

Revisiting the discussion raised in #101 https://github.com/radio-astro-tools/spectral-cube/pull/101: We don't have a nice way to directly manipulate the cube data without accessing it or creating a new cube, loading the whole thing into memory. Any thoughts on how best to do this?

There is one specific use case I'd like to target: Changing the flux units of a cube. Especially, changing from Kelvin -> Jy and vice-versa.

A way to do this now is:

cube_Jy = SpectralCube.read('file.fits') cube_K = SpectralCube(cube_Jy.filled_data[:].to(u.K, u.brightness_temperature(beam, freq)), wcs=cube_Jy.wcs, mask=cube_Jy.mask)

which is tedious but not terrible; I suppose a wrapper .with_flux_unit() implementing that more cleanly would not be too terrible, but it would require reading the whole cube_Jy. I suppose you could also have a function that replaces the original cube's _get_filled_data() with one that translates to a new unit; that is a bit cleverer and doesn't go too far down the lazy-eval rabbit hole.

What about the more general case, though? Simple manipulation of data? apply_numpy_function can in principle do this, e.g. cube.apply_numpy_function(np.add, 5), but apply_numpy_function is currently meant to take only single-argument numpy ufuncs. We could add args to allow any numpy of positional arguments, then if args is non-empty, use how='cube'. This doesn't sound too bad to me, though it requires not being lazy.

I realize this project is probably way off both your radars right now, so please let me know if you won't be getting to this.

— Reply to this email directly or view it on GitHub https://github.com/radio-astro-tools/spectral-cube/issues/182.

keflavich commented 9 years ago

Thanks @ChrisBeaumont. My template helper function (which I'll turn into a PR later) is:

        cube_Jy = cube
        cube_K = copy.copy(cube_Jy)
        cube_K._get_filled_data_Jy = cube_Jy._get_filled_data
        cube_K._unit = u.K
        def gfd(self, *args, **kwargs):
            ret = u.Quantity(self._get_filled_data_Jy(*args,**kwargs),
                             u.Jy)
            return ret.to(u.K, u.brightness_temperature(beam, restfreq))
        cube_K._get_filled_data = types.MethodType(gfd, cube_K)

The with_flux_unit wrapper of this would take equivalencies as an input.

keflavich commented 9 years ago

Just added a new branch (https://github.com/keflavich/spectral-cube/tree/flux_unit) that I'm not sure if I should add as a PR here or separately... still needs tests.

radio-astro-tools / spectral-cube

Data manipulation - multiplication, addition, unit change, etc #182