radio-astro-tools / casa-formats-io

Code to handle I/O from/to data in CASA format
Other
10 stars 7 forks source link

Group tables by DATA_DESC_ID when present #32

Closed astrofrog closed 3 years ago

astrofrog commented 3 years ago

This also cleans up the API to use the standard astropy Table reader mechanism. With this, one can do e.g.:

    >>> import casa_formats_io
    >>> from astropy.table import Table
    >>> table = Table.read('my_dataset.ms', format='casa-table')

I used the following script to check a 10Gb MS file and verified that all the data match between casatools and casa-formats-io:

import numpy as np
from casatools import table
import dask.array as da
import casa_formats_io
from astropy.table import Table
from numpy.testing import assert_equal

table_filename = 'W51-IRS2_B3_uid___A001_X1296_X18f_continuum_merged_12M_selfcal.ms'

data_desc = Table.read(table_filename + '/DATA_DESCRIPTION', format='casa-table')

tb = table()
tb.open(table_filename)

for data_desc_id in range(len(data_desc)):

    print(f'DATA_DESC_ID={data_desc_id}')

    table = Table.read(table_filename, format='casa-table', data_desc_id=data_desc_id)

    sub_tb = tb.query(f'DATA_DESC_ID=={data_desc_id}')

    for colname in sub_tb.colnames():

        if colname == 'FLAG_CATEGORY':
            continue

        reference = sub_tb.getcol(colname).T
        actual = table[colname]
        if isinstance(actual, da.Array):
            actual = actual.compute()

        if reference.size == 0 and actual.size == 0:
            continue

        assert_equal(actual, reference)

        print(f' - {colname} matches')

There is a test failure I need to track down before we can merge, but this works fine with real-world data it seems!

astrofrog commented 3 years ago

The test failure is due to https://github.com/dask/dask/issues/8387, waiting for input on that issue before proceeding.

codecov-commenter commented 3 years ago

Codecov Report

Merging #32 (bef2c3d) into main (04b7412) will increase coverage by 0.24%. The diff coverage is 93.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #32      +/-   ##
==========================================
+ Coverage   55.39%   55.64%   +0.24%     
==========================================
  Files          16       17       +1     
  Lines        2094     2117      +23     
==========================================
+ Hits         1160     1178      +18     
- Misses        934      939       +5     
Impacted Files Coverage Δ
casa_formats_io/casa_low_level_io/table.py 94.02% <83.33%> (-1.86%) :arrow_down:
casa_formats_io/__init__.py 100.00% <100.00%> (ø)
casa_formats_io/casa_dask.py 95.12% <100.00%> (ø)
casa_formats_io/casa_low_level_io/__init__.py 100.00% <100.00%> (ø)
...asa_formats_io/casa_low_level_io/casa_functions.py 100.00% <100.00%> (ø)
...ormats_io/casa_low_level_io/data_managers/tiled.py 94.87% <100.00%> (+0.20%) :arrow_up:
casa_formats_io/table_reader.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 04b7412...bef2c3d. Read the comment docs.

astrofrog commented 3 years ago

I think this is ready for review/merging (review-wise, the main bit that is important is probably the docs)

astrofrog commented 3 years ago

I'll go ahead and merge since approved!