ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS
https://dask-ms.readthedocs.io
Other
19 stars 7 forks source link

Allow additive table_schemas to support non standard columns #22

Closed sjperkins closed 5 years ago

sjperkins commented 5 years ago

Currently xarray-ms supports predefined tables schemas for MS table and some of the more used sub-tables. See here.

It's also possible to supply non-standard table schemas.

Currently there's no simple user mechanism to specify the schema for non-standard columns. The interface should be extended to allow this.

A general workaround strategy might be:

from xarrayms.known_table_schemas import MS_SCHEMA, ColumnSchema
from xarrayms import xds_from_table

my_schema = MS_SCHEMA.copy()
my_schema['MY_MODEL_DATA'] = ColumnSchema(("corr", "chan"))

xds = xds_from_table("WSRT.MS",..., table_schema=my_schema)
o-smirnov commented 5 years ago

Define "might" please. :) If that workaround works, then that's all we need at this stage.

sjperkins commented 5 years ago

OK, this doesn't work through xds_from_ms, but it does through xds_from_table (which xds_from_ms wraps). I've updated the workaround example.

Mulan-94 commented 5 years ago

@sjperkins, sorry, does this mean that the column must be passed on to the table schema each time? Also when I use xds_from_table with an MS I get different dimension names e.g.

<xarray.Dataset>
Dimensions:         (CORRECTED_DATA-1: 119, CORRECTED_DATA-2: 2, DATA-1: 119, DATA-2: 2, FLAG-1: 119, FLAG-2: 2, SIGMA-1: 2, UVW-1: 3, WEIGHT-1: 2, row: 190920)
Coordinates:
    table_row       (row) int32 0 1 2 3 4 ... 190915 190916 190917 190918 190919

Rather than

<xarray.Dataset>
Dimensions:         ((u,v,w): 3, chan: 119, corr: 2, row: 9000)
Coordinates:
    table_row       (row) int32 0 8 9 61 87 ... 102956 102957 102958 102959

Am I missing something maybe?

sjperkins commented 5 years ago

@sjperkins, sorry, does this mean that the column must be passed on to the table schema each time? Also when I use xds_from_table with an MS I get different dimension names e.g.

<xarray.Dataset>
Dimensions:         (CORRECTED_DATA-1: 119, CORRECTED_DATA-2: 2, DATA-1: 119, DATA-2: 2, FLAG-1: 119, FLAG-2: 2, SIGMA-1: 2, UVW-1: 3, WEIGHT-1: 2, row: 190920)
Coordinates:
    table_row       (row) int32 0 1 2 3 4 ... 190915 190916 190917 190918 190919

Rather than

<xarray.Dataset>
Dimensions:         ((u,v,w): 3, chan: 119, corr: 2, row: 9000)
Coordinates:
    table_row       (row) int32 0 8 9 61 87 ... 102956 102957 102958 102959

Am I missing something maybe?

Are you passing the updated schema through, i.e.: xds_from_table("WSRT.MS", ..., table_schema=myschema)?

sjperkins commented 5 years ago

e.g. the following snippet works for me:

In [13]: xds = list(xds_from_table("/home/sperkins/data/WSRT.MS/", table_schema=MY_SCHEMA))
Successful readonly open of default-locked table /home/sperkins/data/WSRT.MS/: 25 columns, 6552 rows
xarrayms.xarray_ms - WARNING - Unable to infer shape of 'FLAG_CATEGORY' column. Ignoring.

In [14]: xds
Out[14]: 
[<xarray.Dataset>
 Dimensions:         ((u,v,w): 3, chan: 64, corr: 4, row: 6552)
 Coordinates:
     table_row       (row) int32 0 1 2 3 4 5 6 ... 6546 6547 6548 6549 6550 6551
 Dimensions without coordinates: (u,v,w), chan, corr, row
 Data variables:
     ANTENNA1        (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     ANTENNA2        (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     ARRAY_ID        (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     CORRECTED_DATA  (row, chan, corr) complex64 dask.array<shape=(6552, 64, 4), chunksize=(6552, 64, 4)>
     DATA            (row, chan, corr) complex64 dask.array<shape=(6552, 64, 4), chunksize=(6552, 64, 4)>
     DATA_DESC_ID    (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     EXPOSURE        (row) float64 dask.array<shape=(6552,), chunksize=(6552,)>
     FEED1           (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     FEED2           (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     FIELD_ID        (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     FLAG            (row, chan, corr) bool dask.array<shape=(6552, 64, 4), chunksize=(6552, 64, 4)>
     FLAG_ROW        (row) bool dask.array<shape=(6552,), chunksize=(6552,)>
     IMAGING_WEIGHT  (row, chan) float32 dask.array<shape=(6552, 64), chunksize=(6552, 64)>
     INTERVAL        (row) float64 dask.array<shape=(6552,), chunksize=(6552,)>
     MODEL_DATA      (row, chan, corr) complex64 dask.array<shape=(6552, 64, 4), chunksize=(6552, 64, 4)>
     OBSERVATION_ID  (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     PROCESSOR_ID    (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     SCAN_NUMBER     (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     SIGMA           (row, corr) float32 dask.array<shape=(6552, 4), chunksize=(6552, 4)>
     STATE_ID        (row) int32 dask.array<shape=(6552,), chunksize=(6552,)>
     TIME            (row) float64 dask.array<shape=(6552,), chunksize=(6552,)>
     TIME_CENTROID   (row) float64 dask.array<shape=(6552,), chunksize=(6552,)>
     UVW             (row, (u,v,w)) float64 dask.array<shape=(6552, 3), chunksize=(6552, 3)>
     WEIGHT          (row, corr) float32 dask.array<shape=(6552, 4), chunksize=(655
Mulan-94 commented 5 years ago

ooh ok, my bad, I forgot that!

o-smirnov commented 5 years ago

Does this mean I get an updated script now? :)

Mulan-94 commented 5 years ago

@o-smirnov yes, working on it.

sjperkins commented 5 years ago

Closed by #23