While debugging some other code I ran into a situation in which dask-ms quietly accepts bad user input and then produces mysterious results. Specifically, if the chunking on ROWID does not mach the chunking in row on the xds.data_vars, dask-ms does not complain but the write goes wrong.
Reproducer
from daskms import xds_from_ms, xds_to_table
import dask
from numpy.testing import assert_array_equal
if __name__ == "__main__":
ms_name = "C147_unflagged.MS"
xdsl = xds_from_ms(
ms_name,
group_cols=["SCAN_NUMBER"],
chunks={"row": -1},
columns=["DATA"]
)
xds = xdsl[0]
# NOTE: Remove this line to make this test pass.
xds = xds.assign_coords({"ROWID": (("row",), xds.ROWID.data.rechunk(10000))})
ref_data = xds.DATA.values # Prior to write.
xds = xds.rename({"DATA": "BROKEN_DATA"})
writes = xds_to_table(xds, ms_name, columns=["BROKEN_DATA"])
dask.compute(writes)
xdsl = xds_from_ms(
ms_name,
group_cols=["SCAN_NUMBER"],
chunks={"row": -1},
columns=["BROKEN_DATA"]
)
xds = xdsl[0]
upd_data = xds.BROKEN_DATA.values # Post write.
assert_array_equal(ref_data, upd_data)
Description
While debugging some other code I ran into a situation in which dask-ms quietly accepts bad user input and then produces mysterious results. Specifically, if the chunking on
ROWID
does not mach the chunking inrow
on thexds.data_vars
, dask-ms does not complain but the write goes wrong.Reproducer