ratt-ru / QuartiCal

CubiCal, but with greater power.
MIT License
8 stars 4 forks source link

DE calibration fails when multiple fields are present in database and only a single field model has been filled #226

Closed bennahugo closed 1 year ago

bennahugo commented 1 year ago

Running Commit #a2bb83c (stimelation)

input_ms.select_fields is set to [2] in this case (there are 4 fields available). I have filled DE_MODEL{1,2,3} with data for this field through crystalball, e.g.

crystalball -sm quickview.1.0.1.1.SC0.ghz-sources.txt -w detag1.reg -f 2 -o DE_MODEL1 1560526258.1.1.1ghz.ms 

I get

2023-01-15 08:49:57 | INFO | preprocess:transcribe_recipe | The following model sources were obtained from --input-model-recipe: 
   Columns: {'DE_MODEL3', 'MODEL_DATA', 'DE_MODEL1', 'DE_MODEL2'} 
   Sky Models: None
2023-01-15 08:49:57 | INFO | ms_handler:read_xds_list | Antenna table indicates 62 antennas were present for this observation.
2023-01-15 08:49:57 | INFO | ms_handler:read_xds_list | Polarization table indicates 4 correlations are present in the measurement set - ['XX', 'XY', 'YX', 'YY'].
2023-01-15 08:49:57 | INFO | ms_handler:read_xds_list | Field table indicates phase centre is at ([ 1.08358621 -1.14760053] [ 2.3861727  -1.19545309]).
2023-01-15 08:49:58 | WARNING | reads:_group_datasets | Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 0 of column FLAG_CATEGORY in /data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/1560526258.1.1.1ghz.ms/table.f18'
2023-01-15 08:49:58 | WARNING | reads:_group_datasets | Ignoring 'DE_MODEL1': Unable to infer shape of column 'DE_MODEL1' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 0 of column DE_MODEL1 in /data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/1560526258.1.1.1ghz.ms/table.f26'
2023-01-15 08:49:58 | WARNING | reads:_group_datasets | Ignoring 'DE_MODEL3': Unable to infer shape of column 'DE_MODEL3' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 0 of column DE_MODEL3 in /data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/1560526258.1.1.1ghz.ms/table.f28'
2023-01-15 08:49:58 | WARNING | reads:_group_datasets | Ignoring 'DE_MODEL2': Unable to infer shape of column 'DE_MODEL2' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 0 of column DE_MODEL2 in /data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/1560526258.1.1.1ghz.ms/table.f27'
2023-01-15 08:49:58 | WARNING | reads:_group_datasets | Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 28365 of column FLAG_CATEGORY in /data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/1560526258.1.1.1ghz.ms/table.f18'
2023-01-15 08:49:59 | WARNING | reads:_group_datasets | Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 1166747 of column FLAG_CATEGORY in /data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/1560526258.1.1.1ghz.ms/table.f18'
2023-01-15 08:49:59 | WARNING | reads:_group_datasets | Ignoring 'DE_MODEL1': Unable to infer shape of column 'DE_MODEL1' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 1166747 of column DE_MODEL1 in /data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/1560526258.1.1.1ghz.ms/table.f26'
2023-01-15 08:49:59 | WARNING | reads:_group_datasets | Ignoring 'DE_MODEL3': Unable to infer shape of column 'DE_MODEL3' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 1166747 of column DE_MODEL3 in /data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/1560526258.1.1.1ghz.ms/table.f28'
2023-01-15 08:49:59 | WARNING | reads:_group_datasets | Ignoring 'DE_MODEL2': Unable to infer shape of column 'DE_MODEL2' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 1166747 of column DE_MODEL2 in /data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/1560526258.1.1.1ghz.ms/table.f27'
2023-01-15 08:49:59 | ERROR | goquartical:<module> | An error has been caught in function '<module>', process 'MainProcess' (64516), thread 'MainThread' (140301569435456):
Traceback (most recent call last):

> File "/data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/venv3quartical/bin/goquartical", line 8, in <module>
    sys.exit(execute())
    │   │    └ <function execute at 0x7f99d14f2310>
    │   └ <built-in function exit>
    └ <module 'sys' (built-in)>
  File "/data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/venv3quartical/lib/python3.8/site-packages/quartical/executor.py", line 28, in execute
    _execute(stack)
    │        └ <contextlib.ExitStack object at 0x7f9a812639d0>
    └ <function _execute at 0x7f99d14f23a0>
  File "/data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/venv3quartical/lib/python3.8/site-packages/quartical/executor.py", line 100, in _execute
    data_xds_list, ref_xds_list = read_xds_list(model_columns, ms_opts)
                                  │             │              └ BaseConfig_input_ms(path='1560526258.1.1.1ghz.ms', data_column='CORRECTED_DATA', sigma_column=None, weight_column=None, time_...
                                  │             └ {'DE_MODEL3', 'MODEL_DATA', 'DE_MODEL1', 'DE_MODEL2'}
                                  └ <function read_xds_list at 0x7f992e4b79d0>
  File "/data2/bhugo/projects/projects/CXOUH110926/subset_quicklook/venv3quartical/lib/python3.8/site-packages/quartical/data_handling/ms_handler.py", line 87, in read_xds_list
    assert all(c in available_columns for c in columns), \
                    │                          └ ('TIME', 'INTERVAL', 'ANTENNA1', 'ANTENNA2', 'FEED1', 'FEED2', 'FLAG', 'FLAG_ROW', 'UVW', 'CORRECTED_DATA', 'DE_MODEL3', 'MOD...
                    └ ['INTERVAL', 'ANTENNA2', 'ANTENNA1', 'FLAG', 'STATE_ID', 'UVW', 'FEED2', 'WEIGHT', 'EXPOSURE', 'FLAG_ROW', 'OBSERVATION_ID', ...

AssertionError: One or more columns in: ('TIME', 'INTERVAL', 'ANTENNA1', 'ANTENNA2', 'FEED1', 'FEED2', 'FLAG', 'FLAG_ROW', 'UVW', 'CORRECTED_DATA', 'DE_MODEL3', 'MODEL_DATA', 'DE_MODEL1', 'DE_MODEL2') is not present in the data.

the following shows the columns are correctly filled

CASA <1>: tb.open("1560526258.1.1.1ghz.ms")
Out[1]: True

CASA <2>: tt = tb.taql("select * from 1560526258.1.1.1ghz.ms where FIELD_ID==2")

CASA <3>: m = tt.getcol("MODEL_DATA")

CASA <4>: m1 = tt.getcol("DE_MODEL1")

CASA <5>: m2 = tt.getcol("DE_MODEL2")

CASA <6>: m3 = tt.getcol("DE_MODEL3")

CASA <7>: mres = m - m1 - m2 - m3

CASA <8>: mres.shape
Out[8]: (4, 478, 1270752)

full command (quotes needed to run in a z-shell without escaping)

goquartical input_ms.path=1560526258.1.1.1ghz.ms input_ms.data_column=CORRECTED_DATA input_ms.time_chunk=300s inpu
t_ms.select_fields="[2]" input_model.recipe='MODEL_DATA~DE_MODEL1~DE_MODEL2~DE_MODEL3:DE_MODEL1:DE_MODEL2:DE_MODEL3' output.subtract_di
rections="[1,2,3]" solver.terms="['G','DE']" solver.iter_recipe="[50,50,50,50,50,50]" output.products="[corrected_residual]" output.col
umns="['DECORR_RES']" G.type=phase G.time_interval=64s G.freq_interval=0 DE.type=complex DE.direction_dependent=True DE.time_interval=64s DE.freq_interval=8
bennahugo commented 1 year ago

I suspect I have no choice but to split out the data into a single field database. I think the selection mechanism is not working as expected because it is not using TaQL underneath the hood and then tries to infer shapes based on first row? Why not do that from the meta data keyword tables and the coldescriptors and/or nrow?

bennahugo commented 1 year ago

Yup as I expected splitting to a single field database works (or at least it runs past that point...). @JSKenyon probably best to implement the selection mechanisms via taql for this reason.

bennahugo commented 1 year ago

confirmed single field database runs through without a hitch

JSKenyon commented 1 year ago

I made a decision to excise all TAQL from QuartiCal to avoid compatibility issues with the various backends. I appreciate that this does need to be fixed. Could you please point me at the data so that I can reproduce @bennahugo?

JSKenyon commented 1 year ago

I have manage to reproduce - the issue is not with the selection but with the assert statement. The assert is relying on the output of xds_from_storage_ms which will omit the names of columns which are not initialised (i.e. columns which exist but haven't been filled, if the exemplar row it reads is in unfilled region). I am working on a fix.

JSKenyon commented 1 year ago

@bennahugo This should be on the stimelation branch now if you want to check that it has resolved your problem.