radio-astro-tools / casa-formats-io

Code to handle I/O from/to data in CASA format
Other
10 stars 7 forks source link

Add a glue data factory for CASA tables #33

Closed astrofrog closed 2 years ago

astrofrog commented 3 years ago

I think it would make sense to have this live here since it integrates with the low-level API for the I/O. To test out, install and try e.g. glue blah.ms or glue blah.ms/SUBTABLE

This needs https://github.com/glue-viz/glue/pull/2249 to work well.

keflavich commented 3 years ago

tried, got:

Traceback (most recent call last):
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/main.py", line 190, in start_glue
    datasets = load_data_files(datafiles)
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/main.py", line 105, in load_data_files
    datasets.append(load_data(df))
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/core/data_factories/helpers.py", line 266, in load_data
    d = as_list(factory(path, **kwargs))
  File "/blue/adamginsburg/adamginsburg/repos/casa-formats-io/casa_formats_io/glue_factory.py", line 78, in read_casa_table
    datasets.append(table_to_glue_data(table, label=label_prefix + f' [DATA_DESC_ID={data_desc_id}]'))
  File "/blue/adamginsburg/adamginsburg/repos/casa-formats-io/casa_formats_io/glue_factory.py", line 35, in table_to_glue_data
    raise ValueError(f"Table {colname} is too wide to expand into single columns: {table[colname].shape}")
ValueError: Table FLAG is too wide to expand into single columns: (9150, 2048, 2)

I'll try w/other subtables

keflavich commented 3 years ago

So this looks like the column is (time, frequency, polarization). Perhaps it makes more sense to load something like this as an ndcube data set?

keflavich commented 3 years ago

(I was able to load subtables fine, but they're uninteresting)

astrofrog commented 3 years ago

Ok interesting yes indeed I think we need a different approach. Could you share the ms file with me?

keflavich commented 3 years ago

MS shared.

Frequencies are defined in the SPECTRAL_WINDOW table, accessible through casa's tb.open:

CASA <16>: tb.getcoldesc('CHAN_FREQ')
Out[16]:
{'comment': 'Center frequencies for each channel in the data matrix',
 'dataManagerGroup': 'StandardStMan',
 'dataManagerType': 'StandardStMan',
 'keywords': {'MEASINFO': {'TabRefCodes': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8, 64], dtype=uint64),
   'TabRefTypes': array(['REST', 'LSRK', 'LSRD', 'BARY', 'GEO', 'TOPO', 'GALACTO', 'LGROUP',
          'CMB', 'Undefined'], dtype='<U16'),
   'VarRefCol': 'MEAS_FREQ_REF',
   'type': 'frequency'},
  'QuantumUnits': array(['Hz'], dtype='<U16')},
 'maxlen': 0,
 'ndim': 1,
 'option': 0,
 'valueType': 'double'}

that tells the metadata

the different frequencies are in columns:

CASA <20>: tb.getcol("CHAN_FREQ", nrow=1).shape
Out[20]: (2048, 1)

CASA <21>: tb.getcol("CHAN_FREQ", nrow=1, startrow=1).shape
Out[21]: (1920, 1)

CASA <22>: tb.getcol("CHAN_FREQ", nrow=1, startrow=2).shape
Out[22]: (1920, 1)

and the frame is given in MEAS_FREQ_REF

e-koch commented 3 years ago

For reading the polarization codes in file.ms/POLARIZATION in the CORR_TYPE column, this should be the right conversion:

  1. I
  2. Q
  3. U
  4. V
  5. RR
  6. RL
  7. LR
  8. LL
  9. XX
  10. XY
  11. YX
  12. YY

This is translating from AIPS memo 117 (http://www.aips.nrao.edu/aipsmemo.html) explained in pyuvdata (https://github.com/RadioAstronomySoftwareGroup/pyuvdata/blob/406b88eb91f511f6a5d50ded29e7615c3df4f2a1/docs/references/beamfits_memo.tex#L89) and translated to the MS format (https://github.com/RadioAstronomySoftwareGroup/pyuvdata/blob/b382be644848abe315df035d1cec23f694e65427/pyuvdata/uvdata/ms.py#L37).

e-koch commented 3 years ago

From a VLA L-band observation with full Stokes in all spectral windows:

tb.open('table.ms/POLARIZATION')
CASA <45>: tb.getcol("CORR_TYPE", nrow=1)
Out[45]:
array([[5],
       [6],
       [7],
       [8]])

CASA <46>: tb.getcol("CORR_TYPE", nrow=1).shape
Out[46]: (4, 1)
astrofrog commented 2 years ago

And just a note from my side - here's an example of column shapes:


UVW (724500, 3)
FLAG (724500, 12, 2)
WEIGHT (724500, 2)
SIGMA (724500, 2)
ANTENNA1 (724500,)
ANTENNA2 (724500,)
ARRAY_ID (724500,)
DATA_DESC_ID (724500,)
EXPOSURE (724500,)
FEED1 (724500,)
FEED2 (724500,)
FIELD_ID (724500,)
FLAG_ROW (724500,)
INTERVAL (724500,)
OBSERVATION_ID (724500,)
PROCESSOR_ID (724500,)
SCAN_NUMBER (724500,)
STATE_ID (724500,)
TIME (724500,)
TIME_CENTROID (724500,)
DATA (724500, 12, 2)
MODEL_DATA (724500, 12, 2)
CORRECTED_DATA (724500, 12, 2)
``

Until glue supports columns with different shapes, we could broadcast all arrays to the 3D shape of DATA - with the exception of UVW which we could split into x/y/z components.

I need to check if the ,2 in the dimension of SIGMA is the same as in DATA. 
astrofrog commented 2 years ago

It might also make sense to split out polarization into separate columns and keep (time, frequency) as 2D images, because otherwise the cubes will be very skinny along the polarization dimensions and not sure how useful that is.

astrofrog commented 2 years ago

@keflavich - can you try this latest version?

codecov-commenter commented 2 years ago

Codecov Report

Merging #33 (eb32a47) into main (95403b8) will increase coverage by 0.83%. The diff coverage is 69.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
+ Coverage   55.66%   56.50%   +0.83%     
==========================================
  Files          17       18       +1     
  Lines        2118     2230     +112     
==========================================
+ Hits         1179     1260      +81     
- Misses        939      970      +31     
Impacted Files Coverage Δ
casa_formats_io/glue_factory.py 65.21% <65.21%> (ø)
casa_formats_io/casa_low_level_io/table.py 94.66% <93.33%> (+0.63%) :arrow_up:
casa_formats_io/casa_dask.py 95.38% <0.00%> (+0.26%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 95403b8...eb32a47. Read the comment docs.

keflavich commented 2 years ago

Looks good. Some questions:

I got this when trying to load the Table Viewer image

Traceback (most recent call last):
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/utils/qt/decorators.py", line 61, in decorated
    return f(*args, **kwargs)
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/viewers/common/qt/data_viewer.py", line 217, in add_data
    return super(DataViewer, self).add_data(data)
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/viewers/common/viewer.py", line 207, in add_data
    layer = get_layer_artist_from_registry(data, self) or self.get_data_layer_artist(data)
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/viewers/common/viewer.py", line 238, in get_data_layer_artist
    return self.get_layer_artist(self._data_artist_cls, layer=layer, layer_state=layer_state)
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/viewers/table/qt/data_viewer.py", line 338, in get_layer_artist
    return cls(self, self.state, layer=layer, layer_state=layer_state)
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/viewers/table/qt/data_viewer.py", line 179, in __init__
    super(TableLayerArtist, self).__init__(viewer_state,
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/viewers/common/layer_artist.py", line 29, in __init__
    self._viewer_state.layers.append(self.state)
  File "/home/adamginsburg/.local/lib/python3.9/site-packages/echo/containers.py", line 52, in append
    self.notify_all()
  File "/home/adamginsburg/.local/lib/python3.9/site-packages/echo/containers.py", line 45, in notify_all
    callback(*args, **kwargs)
  File "/home/adamginsburg/.local/lib/python3.9/site-packages/echo/containers.py", line 166, in __call__
    self.function(*args, **kwargs)
  File "/home/adamginsburg/.local/lib/python3.9/site-packages/echo/containers.py", line 189, in callback
    self.notify(instance, wrapped_list, wrapped_list)
  File "/home/adamginsburg/.local/lib/python3.9/site-packages/echo/core.py", line 125, in notify
    cback(new)
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/viewers/table/qt/data_viewer.py", line 298, in _on_layers_changed
    self.model = DataTableModel(self)
  File "/blue/adamginsburg/adamginsburg/repos/glue/glue/viewers/table/qt/data_viewer.py", line 40, in __init__
    raise ValueError("Can only use Table widget for 1D data")
ValueError: Can only use Table widget for 1D data
keflavich commented 2 years ago

Example: can show that the high values all come from autocorrelations: image

(note that the high stuff around 17,050 is selected but I can't see it highlighted in red - looks like a visual error in the 1D profile selection. You can see the selection in both the 2D planes though)

keflavich commented 2 years ago

I think we should merge this and deal with additional issues in future PRs

astrofrog commented 2 years ago

Sounds good, let's merge, and I'll open separate issues for the outstanding things.