In the case of non-numeric data (primarily strinigs), python-casacore return lists of objects when issuing getcol and putcol type commands.
This conflicts a little with dask, as it expects ndarrays to represent the data in each chunk. Historically, simply using lists to represent chunk data would cause dask's internal array stitching operations to break. In the getcol case, this can be worked around by casting the list of objects to an ndarray of objects.
In the putcol case, one needs to convert dask's ndarrays of objects back to a list with data.tolist(). Without this conversion, python-casacore can segfault. This works (for 1D lists at least) and so we have a workaround.
dask 2.0.0 added a meta attribute to dask Arrays, which contains metadata describing the type and dimensionality of the data representing each chunk. https://github.com/dask/dask/issues/4070. In future, it may be possible to use this to properly handle lists as chunks.
In the case of non-numeric data (primarily strinigs), python-casacore return lists of objects when issuing getcol and putcol type commands.
This conflicts a little with dask, as it expects ndarrays to represent the data in each chunk. Historically, simply using lists to represent chunk data would cause dask's internal array stitching operations to break. In the getcol case, this can be worked around by casting the list of objects to an ndarray of objects.
In the putcol case, one needs to convert dask's ndarrays of objects back to a list with
data.tolist()
. Without this conversion, python-casacore can segfault. This works (for 1D lists at least) and so we have a workaround.dask 2.0.0 added a
meta
attribute to dask Arrays, which contains metadata describing the type and dimensionality of the data representing each chunk. https://github.com/dask/dask/issues/4070. In future, it may be possible to use this to properly handle lists as chunks.