ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS
https://dask-ms.readthedocs.io
Other
19 stars 7 forks source link

Mismatched chunks between data_vars and ROWID gives weird results. #238

Closed JSKenyon closed 2 years ago

JSKenyon commented 2 years ago

Description

While debugging some other code I ran into a situation in which dask-ms quietly accepts bad user input and then produces mysterious results. Specifically, if the chunking on ROWID does not mach the chunking in row on the xds.data_vars, dask-ms does not complain but the write goes wrong.

Reproducer

from daskms import xds_from_ms, xds_to_table
import dask
from numpy.testing import assert_array_equal

if __name__ == "__main__":

    ms_name = "C147_unflagged.MS"

    xdsl = xds_from_ms(
        ms_name,
        group_cols=["SCAN_NUMBER"],
        chunks={"row": -1},
        columns=["DATA"]
    )

    xds = xdsl[0]

    # NOTE: Remove this line to make this test pass.
    xds = xds.assign_coords({"ROWID": (("row",), xds.ROWID.data.rechunk(10000))})

    ref_data = xds.DATA.values  # Prior to write.

    xds = xds.rename({"DATA": "BROKEN_DATA"})

    writes = xds_to_table(xds, ms_name, columns=["BROKEN_DATA"])

    dask.compute(writes)

    xdsl = xds_from_ms(
        ms_name,
        group_cols=["SCAN_NUMBER"],
        chunks={"row": -1},
        columns=["BROKEN_DATA"]
    )

    xds = xdsl[0]

    upd_data = xds.BROKEN_DATA.values  # Post write.

    assert_array_equal(ref_data, upd_data)