ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS
https://dask-ms.readthedocs.io
Other
19 stars 7 forks source link

Retain ROWID coordinates during MS conversion #286

Closed sjperkins closed 1 year ago

sjperkins commented 1 year ago

Removing the ROWID coordinate prevents newer formats from mapping back to CASA Measurement Sets.

JSKenyon commented 1 year ago

Would you like to try this out with the backup and restore functionality @landmanbester?

landmanbester commented 1 year ago

Yup, will do. Just need to convert some data again

sjperkins commented 1 year ago

Let me know if a release containing this functionality would be desirable.

landmanbester commented 1 year ago

Hmmm, I got the following error with the latest master

(dms) ╭─bester@oates ~/projects/ESO137/msdir
╰─➤  dask-ms convert ms1_primary.ms -g "FIELD_ID,DATA_DESC_ID,SCAN_NUMBER" -o ms1_primary.zarr --chunks="{row:50000,chan:256}" --format zarr --force        2023-09-14 15:03:58,242 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 0 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:58,407 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 98332 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:58,510 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 196664 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:58,622 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 294996 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:58,716 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 393328 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:58,873 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 491660 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:59,015 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 589992 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
2023-09-14 15:03:59,154 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 688324 of column FLAG_CATEGORY in /home/bester/projects/ESO137/msdir/ms1_primary.ms/table.f18'
/home/bester/software/dask-ms/daskms/reads.py:269: PerformanceWarning: Increasing number of chunks by factor of 16
  dask_array = da.blockwise(
/home/bester/software/dask-ms/daskms/reads.py:269: PerformanceWarning: Increasing number of chunks by factor of 16
  dask_array = da.blockwise(
/home/bester/software/dask-ms/daskms/reads.py:269: PerformanceWarning: Increasing number of chunks by factor of 16
  dask_array = da.blockwise(
/home/bester/software/dask-ms/daskms/reads.py:269: PerformanceWarning: Increasing number of chunks by factor of 16
  dask_array = da.blockwise(
2023-09-14 15:03:59,495 - dask-ms - INFO - Input: 'MeasurementSet' file:///home/bester/projects/ESO137/msdir/ms1_primary.ms
2023-09-14 15:03:59,495 - dask-ms - INFO - Output: 'zarr' file:///home/bester/projects/ESO137/msdir/ms1_primary.zarr
2023-09-14 15:04:07,907 - dask-ms - WARNING - Ignoring SOURCE
2023-09-14 15:04:07,911 - dask-ms - WARNING - Ignoring 'TARGET': Unable to infer shape of column 'TARGET' due to:
'TableProxy::getCell: no such row'
2023-09-14 15:04:07,912 - dask-ms - WARNING - Ignoring 'DIRECTION': Unable to infer shape of column 'DIRECTION' due to:
'TableProxy::getCell: no such row'
Traceback (most recent call last):
  File "/home/bester/.venv/dms/bin/dask-ms", line 8, in <module>
    sys.exit(main())
  File "/home/bester/software/dask-ms/daskms/apps/entrypoint.py", line 9, in main
    return EntryPoint(sys.argv[1:]).execute()
  File "/home/bester/software/dask-ms/daskms/apps/entrypoint.py", line 33, in execute
    cmd.execute()
  File "/home/bester/software/dask-ms/daskms/apps/convert.py", line 193, in execute
    dask.compute(writes)
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/base.py", line 599, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/threaded.py", line 89, in get
    results = get_async(
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/local.py", line 511, in get_async
    raise_exception(exc, tb)
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/local.py", line 319, in reraise
    raise exc
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/local.py", line 224, in execute_task
    result = _execute_task(task, data)
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/optimization.py", line 990, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/core.py", line 149, in get
    result = _execute_task(task, cache)
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/bester/software/dask-ms/daskms/reads.py", line 186, in getter_wrapper
    return future.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/bester/software/dask-ms/daskms/reads.py", line 65, in ndarray_getcolslice
    getcolslicenp(
  File "/home/bester/.venv/dms/lib/python3.8/site-packages/casacore/tables/table.py", line 1099, in getcolslicenp
    return self._getcolslicevh(columnname, blc, trc, inc,
RuntimeError: Table DataManager error: TiledStMan: calcCacheSize: invalid arguments

Any idea what's going wrong?

sjperkins commented 1 year ago

Any idea what's going wrong?

Not immediately. Is this in a fresh venv? If not and you can reproduce in a fresh VM, can you create a new issue?

landmanbester commented 1 year ago

Rerunning in a fresh python3.10 venv now. Will open an issue if the problem persists