ratt-ru / CubiCal

A fast radio interferometric calibration suite.
GNU General Public License v2.0
18 stars 13 forks source link

Unflagged bandpass solutions #203

Closed bennahugo closed 6 years ago

bennahugo commented 6 years ago

Trying to solve for non-parametric 2x2 bandpasses on 1934 and 3C286. The bandpass solutions looks nothing like I would expect. Furthermore It looks like the flags are not stored with the solutions or it completely ignores legacy flags: gocubical --data-ms COMBINED.cubical.ms --model-list MODEL_DATA --model-ddes never --weight-column WEIGHT --flags-auto-init legacy --sol-jones G --data-freq-chunk 1 --data-time-chunk 1000 --g-time-int 1000 --g-freq-int 1 --g-type complex-2x2 --sol-min-bl 125.0 --sel-field 1 --dist-ncpu=8 --out-name 1934 image

compare this with CASA: image

o-smirnov commented 6 years ago

I see four lines per plot -- what's the meaning of the four?

Got the MS somewhere where I can look at it?

bennahugo commented 6 years ago

it looks like it is computing 2 solutions per correlation for some reason - although it is printing it will do one time chunk

 - 12:32:35 - data_handler       [0.1/0.1 5.8/5.8 0.1Gb]   read indexing columns (56730 total rows)
 - 12:32:35 - data_handler       [0.1/0.1 5.8/5.8 0.1Gb]   built timeslot index (30 unique timestamps)
 - 12:32:35 - data_handler       [0.1/0.1 5.8/5.8 0.1Gb]   max chunk size is 1000 timeslots and/or -- seconds
 - 12:32:35 - data_handler       [0.1/0.1 5.8/5.8 0.1Gb]   found 1 time chunks: 0 30
 - 12:32:35 - data_handler       [0.1/0.1 5.8/5.8 0.1Gb]   generated 1 row chunks based on time and DDID
 - 12:32:35 - data_handler       [0.1/0.1 5.8/5.8 0.1Gb]   row chunks yield 1 potential tiles
 - 12:32:35 - data_handler       [0.1/0.1 5.8/5.8 0.1Gb]   coarsening this to 1 tiles (max 4096 chunks per tile, based on 7/0 requested
)
o-smirnov commented 6 years ago

it looks like it is computing 2 solutions per correlation for some reason

Why do you say that? It says "found 1 time chunks: 0 30". That's 1 solution.

bennahugo commented 6 years ago

No it looks like it is just consecutive channels, 1 unique timestep in the the solutions table

bennahugo commented 6 years ago

hugo@stevie ~/verifypol/msdir/COMBINED.cubical.ms

o-smirnov commented 6 years ago

BTW --data-freq-chunk 1 is massive overkill -- you already have freq-int 1 -- but a chunk size of 1 makes it parallelize over 1 channel at a time (yet a complete tile of 4096 channels is loaded into memory anyway) which is way too fine-grained.

This could even be the reason there's a flag propagation problem. We never tested the 1-channel-chunk edge case that much.

bennahugo commented 6 years ago

Sure, so check out baseline 5-39 (m007&m042 - one of the shorter ones: 278.14m). I've applied the static mask and flagged it further with 2 rounds of flagging which dialates the mask a bit: image

After cubical is run all the flags are gone! image

o-smirnov commented 6 years ago

Well that's clearly a bug then. I assume 1934.parset is the parset from your most recent run?

bennahugo commented 6 years ago

Yes I recently tried a diagonal run on 3C286 to check - this will be the most recent:

gocubical --data-ms COMBINED.cubical.ms --model-list MODEL_DATA --model-ddes never --weight-column WEIGHT --flags-auto-init legacy --sol-jones G --data-freq-chunk 1 --data-time-chunk 1000 --g-time-int 1000 --g-freq-int 1 --g-type complex-diag --sol-min-bl 125.0 --sel-field 0 --dist-ncpu=8 --out-name 3C286diag
o-smirnov commented 6 years ago

Yeah I see the bug. @IanHeywood may affect you too. Stay tuned.

o-smirnov commented 6 years ago

While I'm at it -- this line is probably harmless, but should be caught:

/home/oms/.local/lib/python2.7/site-packages/numpy/ma/core.py:2784: UserWarning: Warning: converting a masked element to nan.

@bennahugo seems to work now, flags being read as intended. Try the issue-203 branch if you're brave. But there's a new problem for you.

bennahugo commented 6 years ago

gocubical --data-ms COMBINED.cubical.ms --model-list MODEL_DATA --model-ddes never --weight-column WEIGHT --flags-auto-init legacy --sol-jones G --data-freq-chunk 1 --data-time-chunk 1000 --g-time-int 1000 --g-freq-int 1 --g-type complex-2x2 --sol-min-bl 125.0 --sel-field 1 --dist-ncpu=8 --out-name 1934

 exception: 'DESEL'
 - 09:22:02 - main               [io] [12.2/14.0 18.5/19.4 26.2Gb] Traceback (most recent call last):
  File "/scratch/bhugo/CubiCal/cubical/workers.py", line 421, in _io_handler
    tile.load(load_model=load_model)
  File "/scratch/bhugo/CubiCal/cubical/data_handler/ms_tile.py", line 678, in load
    self.dh.flagcounts["DESEL"] += num_inactive
KeyError: 'DESEL'
 - 09:22:02 - main               [0.1/0.1 5.5/6.0 26.2Gb] Exiting with exception: KeyError('DESEL')
 Traceback (most recent call last):
  File "/scratch/bhugo/CubiCal/cubical/main.py", line 360, in main
    stats_dict = workers.run_process_loop(ms, tile_list, load_model, single_chunk, solver_type, solver_opts, debug_opts)
  File "/scratch/bhugo/CubiCal/cubical/workers.py", line 205, in run_process_loop
    return _run_multi_process_loop(ms, load_model, solver_type, solver_opts, debug_opts)
  File "/scratch/bhugo/CubiCal/cubical/workers.py", line 251, in _run_multi_process_loop
    if not done or not io_futures[itile].result():
  File "/scratch/bhugo/venv/local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 455, in result
    return self.__get_result()
  File "/scratch/bhugo/venv/local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 414, in __get_result
    raise exception_type, self._exception, self._traceback
KeyError: 'DESEL'
bennahugo commented 6 years ago

Fixed - just initialization of the key. I'll fix the check in #210 as part of this issue while we're at it

o-smirnov commented 6 years ago

Yeah, I'd already fixed that key too, but forgot to push. Anyway all sorted and merged together for now, let me sort out that G.err problem...

bennahugo commented 6 years ago

Flags are now propagating correctly but still looks like there is a problem in the solver for every other solution. Here I'm trying sol-diag-diag 1 and complex-diag: image

The corrected data has more or less the same spread as in CASA - a bit more sensitive to RFI image image

bennahugo commented 6 years ago

nvm... it looks like there is a logic error somewhere in how the gain cube is reshaped when I write it to CASA tables

bennahugo commented 6 years ago

Tracked it down. Will commit to the 203 branch soon. @o-smirnov are there any default set of tests we can run to make sure things work as expected?

o-smirnov commented 6 years ago

Yes, it's called nosetests, and you taught it to me once. :)

bennahugo commented 6 years ago

But where is the test data?

o-smirnov commented 6 years ago

git lfs.

bennahugo commented 6 years ago

Could we add this to jenkins so we have at least some testing on PRs?

o-smirnov commented 6 years ago

I've been meaning to ask you to do just that -- kept forgetting!

bennahugo commented 6 years ago

Ok I'll open a ticket... that big green button is far too scary without a test

o-smirnov commented 6 years ago

I have merged this into madmax-moreplots, which contains a bunch of other enhancements (to plotting and flagging primarily), so maybe you want to work against that now....

bennahugo commented 6 years ago

Ok, I'm happy with this: Leakages on 1934 with cubical (no crosshand delay solutions) - sort of believable when I look at the DC leakages from CASA image