ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS
https://dask-ms.readthedocs.io
Other
19 stars 7 forks source link

Mistakenly attempting to access a subtable as a main table causes weird behaviour #261

Closed JSKenyon closed 1 year ago

JSKenyon commented 1 year ago

Description

While attempting to write some flags back to a dataset @landmanbester ran into a bug which I will endeavour to reproduce here.

Reproducer

from daskms.experimental.zarr import xds_to_zarr, xds_from_zarr
import xarray
import dask.array as da

if __name__ == "__main__":

    xds = xarray.Dataset({"DUMMY": (("row",), da.ones(1000))})

    writes = xds_to_zarr(xds, "dummy_dir::dummy_subtable")

    da.compute(writes)

    # This will work as expected.
    correct_xds = xds_from_zarr("dummy_dir::dummy_subtable")

    # This will silently return an empty list and somehow add a MAIN entry
    # to dummy_dir/dummy_subtable.
    incorrect_xds = xds_from_zarr("dummy_dir/dummy_subtable")

    # If you run this a second time, it will crash due to the presence of the
    # additional MAIN directory in dummy_dir/dummy_subtable, added by the
    # second xds_from_zarr call.

What should happen

Ideally, dask-ms should be able to identify when a user has mistakenly pointed it at an incorrect path. More importantly, calls to xds_from_zarr should never be able to write to disk as seems to be the case here (creating a new directory with a .zgroup file).