Open astrofrog opened 3 years ago
how general is this? Would there need to be different accommodations for tall, skinny cubes and short, fat ones?
What I'm thinking is that for FITS cubes where the Numpy array is C-contiguous (i.e. no reordering of axes to do weird WCS) for global operations and image plane operations we definitely should be using whole image plane chunks. We could set a maximum size for the image plane chunks if we thought e.g. cubes with 100,000x100,000 pixel image planes could exist though.
So no special casing needed aside from maybe thinking what a sensible maximum image plane chunk size would be.
Currently, for FITS cubes, we convert the numpy array to a dask array with default chunking, which splits up all axes into chunks. This is the right behavior for arbitrary operations on it, but for operations that use the whole cube, such as
statistics()
, and for image plane calculations, this is not optimal.One idea would be to internally have e.g three dask arrays all backed by the same data but optionally with different chunking:
For CASA cubes, these would all be the same since the chunking we have can't really be improved much. For FITS cubes,
self._data_global
andself._data_image
would use whole image plane chunks.Using a dask array with whole image plane chunks improves the performance of
statistics()
by a factor of 5x or more, and undoubtedly would also improve the performance of image-plane convolution since it is a more 'native' chunking.