TEC screen fitting with kMS solutions

rvweeren commented 6 years ago

New issue page to post results from Josh's phase screen work.

Varying clocks were removed beforehand.

rvweeren commented 6 years ago

This was the used command for the solve (done for each 2 MHz block)

kMS.py --MSName my.ms --SolverType KAFCA --PolMode Scalar --BaseImageName image_full_ampphase_di_m.NS --dt 0.133333 --NIterKF 6 --CovQ 0.1 --LambdaKF=0.5 --NCPU 60 --OutSolsName DDjosh --NChanSols 2 --PowerSmooth=1.0 --InCol DATA --Weighting Natural --UVMinMax=0.100000,1000.000000 --SolsDir=SOLSDIRjosh --BeamMode LOFAR --LOFARBeamMode=A --DDFCacheDir=. --NodesFile image_dirin_SSD_m.npy.ClusterCat.npy --DicoModel image_full_ampphase_di_m.NS.DicoModel

Joshuaalbert commented 6 years ago

We have done a full phase+amp solve on a dataset with kMS then went back and used this skymodel as input for a phase only solve in kMS. This produces a set of phases (without imposing any tec model). The clock was taken out before. We do the phase only solve on split MSs of 2 1MHz blocks each.

Context

I'm doing a bayesian tec screen fit on raw phase solutions. NDPPP only spits out phases after imposing a tec+csp model which biases the model. Ideally, the input would be completely independent phases in time and freq. Note: the use of Kalman filter to generate these phases means there is time dependence imposed, however the fact that it's an optimal linear gaussian estimator means the errors should be unbiased if a proper state transition was used.

Note: Two schools of thought exist. Impose the Bayesian model during the actual solve, or do it after on the gains. They are not equivalent, but under some assumptions can be similar. The fact that we see the below banding in the low-brightness directions shows that imposing the bayesian model from the start would have higher power than post-processing the gains.

Brightest direction phase comparison to NDPPP

For the brightest direction things line up resaonably well, which is encouraging

screenshot from 2018-07-12 15-16-19

time-freq dependence of brightest direction

Things look nice for the brightest direction. screenshot from 2018-07-12 15-17-23

time-freq dependence of low brightness direction

An immediately noticeable problem is that there appears to be some banding. The MSs were solved independently with bandwidths per MS of 2MHz blocked in two 1MHz intervals. There appears to be an odd banding issue here. Why does every other channel have zero gains? Suppose we would have split into twice as many MSs with 1MHz each. Would this also occur then? Note that there does appear to be some useful infomration here (slight gradients noticeable). The Kalman filter has imposed some temporal smoothness. Does the usage of option --NChanSols 2 mean the frequency channels in solve are not independent? Perhaps transition model in kalman filter imposes something here? screenshot from 2018-07-12 15-18-00

Joshuaalbert commented 6 years ago

Amplitude in time-freq for the brightest direction

This fan like pattern must be related to transit timescales. Perhaps the beam? Perhaps a wrong sky model? screenshot from 2018-07-12 15-40-34

twshimwell commented 6 years ago

Possible to show a few more directions? Also might be nice to see the field and the facet layout (perhaps those are online somewhere?) as perhaps e.g. the faintest facet is diverging or something.

twshimwell commented 6 years ago

Also for the phase plots whats the colour scale? Does it wrap around?

Joshuaalbert commented 6 years ago

For the phase plots they are hsv, wrapped, -pi to +pi. @rvweeren can post the facet layout, but yes this only happens in the facets with almost no flux. Most important is to understand why this frequency banding occurs though. Even in the low brightness directions every other freq has enough signal to produce something.

screenshot from 2018-07-12 16-14-19

rvweeren commented 6 years ago

This is the layout. The banding problem occurs for the facets mostly at the edge.

cyriltasse commented 6 years ago

So the philosophy of the KF is in principle to give a prior evolution file that is TEC fitted. These off axis facets would be smoothed to zero-phase basically, and the output would be more or less the same. You just don't have much signal there, so it can be healthy. So my first simple question - can you start from TEC-fitted values to get the tec-screen?

Then on the amp structured, it could be an unmodeled source or could be real as well - hard to know. What's really strange to me is the 1/2 patern on all directions - I've never seen that before, is that a plotting problem?...

rvweeren commented 6 years ago

I doubt it (1/2 patern) is a plotting issue. The plotting commands are the same (apart from the direction index) and 1/2 banding is much less strong for facets near the center which have plenty of flux (we tried more than one and it really correlates with the location in the field and amount of flux)

Josh make sure you phase reference to some fixed station

rvweeren commented 6 years ago

My suspicion is that if I use --NChanSols 1 and split the observations in 50 blocks of 1 MHz it will disappear.... (instead of 25 blocks of 2 MHz and --NChanSols 2). Cyril, I could try that if you think this would be a useful test?

Joshuaalbert commented 6 years ago

@cyriltasse Yes, I can go from tec solved independently per direction and derive a tec screen (more accuractely it's a tec + cs phase as NDPPP produces). That is something I've already done and which we want to avoid now, because such a scheme imposes biases into the final phase screen. It essentially is only valid at the central frequency, and is just plain wrong in many cases (we can show that the chi2 basin of TEC in low signal regime doesn't even have a global minimum in the right place). Another reason to go straight from phases is because my method is invariant to phase wrapping, so it can properly handle wild ionospheres.

@rvweeren Yes they are all referenced to ant0. Yes, for facets with lots of flux the odd-even banding is less prevalent, however the problem may still be there but less noticeable.

@rvweeren @cyriltasse before simply using --NChanSols 1 and split the observations in 50 blocks of 1 MHz it would be good to understand where this is coming from so that we can rule out that there is not something implicitly wrong. Even in the low signal regime there should still be noisy answers and not entirely flagged channels.

cyriltasse commented 6 years ago

just for completeness, NChanSols is the number of solutions on the frequency axis, not the step...

twshimwell commented 6 years ago

Perhaps worth plotting this for a few more fields to see if you always see these strange structures.

Joshuaalbert commented 6 years ago

@rvweeren can you split the MS into 1MHz bins and rerun with Nchansols=1. That way we can continue with my algorithm testing, and at the same time see what happens in the edge case of one block one solve.

rvweeren commented 6 years ago

Yes, I have it running with 1 MHz blocks and --NChanSols 1

twshimwell commented 6 years ago

I think important to check that we dont see this behavior on other fields too though because we use NChanSols 2 as a standard and Cyril said he hadnt seen this type of thing before.

cyriltasse commented 6 years ago

Is it possible that the number of data-channels varies from sols freq-bin to freq-bin?

twshimwell commented 6 years ago

I think that there are an even number (10sbs) in one of these msfiles so id have thought that NChanSols would put 5 in each sols freq-bin.

twshimwell commented 6 years ago

You could try plotting these ones /disks/paradata/shimwell/LoTSS-DR1/other-pointings/testing/ongoing-May18runs/L232875_NKF6/L232875_newrestor/SOLSDIR/*/killMS.DDS3_full.sols.npz

rvweeren commented 6 years ago

Double checked, there are always freq 20 channels per block (or 10 in my new run).

rvweeren commented 6 years ago

We can also try on others datasets but that will take more time (we need to get the clocks out for the TEC screen fitting)

twshimwell commented 6 years ago

Oh I was thinking purely for the plotting to look at the weird solution issue (even the clock isnt removed it would be nice to see that solutions are not weird).

rvweeren commented 6 years ago

@twshimwell, yes that will be much quicker/easier to test. Will do that on lofar2 as soon as the--NChanSols 1 solve is done.

Joshuaalbert commented 6 years ago

I'll plot your sols @twshimwell just to check. Can you give me write permission in that folder so that I can export the sols to hdf5, merge then plot? I'll delete them after to get back the storage.

twshimwell commented 6 years ago

thanks. ive tried to give permissions to /disks/paradata/shimwell/LoTSS-DR1/other-pointings/testing/ongoing-May18runs/L232875_NKF6/L232875_newrestor/SOLSDIR

If it doesnt work just copy SOLSDIR, its not so big.

Joshuaalbert commented 6 years ago

That data set is a little weird. When merging the ms's I get that there are a mismatch in axes:

INFO: Sorting output axes...
len pol: 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 - Will be: 4
len dir: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 45 1 1 1 45 1 1 1 45 1 - Will be: 45
len ant: 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 - Will be: 62
len freq: 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 1 2 2 2 1 2 - Will be: 79
len time: 596 716 716 596 596 511 511 511 447 447 447 447 447 447 480 358 358 326 480 596 596 511 480 511 - Will be: 3434

My expectation was that each of those ms's contained a solution for the same axes except for the frequency axis.

Joshuaalbert commented 6 years ago

Indeed, the mismatch in axes means that the time axis is getting stacked instead of concatenating down the freq axis. Also 99.1% of the phase values are nan's. Do you have another dataset? For example, if I plot the time-freq phase for direction 0, and the first 596 timeslices: screenshot from 2018-07-14 14-01-50

twshimwell commented 6 years ago

You are looking at just the DDS3_full sols right? There should be one file in each of the SOLSDIR/msname/ directories.

Joshuaalbert commented 6 years ago

Yes I exported the .ms/full.sols.npz to hdf5, then merged those. (Oh I just realised there are several solsets inside. sol000 - sol002.)

twshimwell commented 6 years ago

Ah you just want .ms/DDS3full.sols.npz

The wildcard you pasted would have picked up other solutions (the DIfull ones) that have just one direction and varying time averaging.

Sent from my iPhone

On 14 Jul 2018, at 14:34, Joshua George Albert notifications@github.com wrote:

Yes I exported the .ms/full.sols.npz to hdf5, then merged those.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Joshuaalbert commented 6 years ago

There are still time axis mismatches, so this results in an incorrect merge. Unless all axes are the same except one axis, and concatenation happens down that axis, the merge will fail. Also the losoto h5parm_collector script needs to sort the freq axe as right now it relies on the order of input files. There are things I can fix and get back to you. But I would also request a consistent dataset to plot.

Joshuaalbert commented 6 years ago

For the moment though, here is a simple inspection. Each subband exported solution, as I mentioned above, has solsets sol000, sol001, and sol002, each with phase000 soltabs. The len of the frequency axis respectively is 2, 1, and 1. Therefore, let's look at the sol000/phase000 for subband '166MHz': Both freq blocks are there and similar colors. That's nice. However there is only 1 direction in sol000/phase000 so I can't look at other directions.

screenshot from 2018-07-14 14-54-14

Joshuaalbert commented 6 years ago

@twshimwell Ahh, okay I'll use better wildcard.

twshimwell commented 6 years ago

Oh weird as the DDS3full ones should have been made with the same dt in the solve.

Ok well maybe these ones instead /disks/paradata/shimwell/LoTSS-DR1/other-pointings/testing/ongoing-May18runs/L639769/SOLSDIR//DDS3_full.sols*npz

twshimwell commented 6 years ago

Oh just saw you used the new wildcard :) slow typing on my phone

Joshuaalbert commented 6 years ago

screenshot from 2018-07-14 15-15-48

twshimwell commented 6 years ago

Thanks for plotting these Josh. They look quite sensible then which is very good to see.

rvweeren commented 6 years ago

@twshimwell, but I only see about 25 solutions along the freq. axis, so this is not a --NChanSols 2 solve?

Joshuaalbert commented 6 years ago

@rvweeren Here is the plot of josh50block for the blocks that have finished so far. This looks much more normal. What is it about the edge case NChanSols=2? What does NChanSols=3 do? screenshot from 2018-07-14 17-53-00

rvweeren commented 6 years ago

Note that the y-axis is not simply freq here (since the solve is only partly done, there are still gaps). However, I think we can already conclude the banding is gone. (I can try --NChanSols 3, so we get 3 solutions along the freq. axis per ms, but it will take 2 more days or so before the current solve is done)

Joshuaalbert commented 6 years ago

Yes ^^ however, the blocks are mostly completing in order so they are close to being ordered. @rvweeren For reference this is the first one (with banding). screenshot from 2018-07-14 18-55-53

twshimwell commented 6 years ago

Ah sorry it seems i dont have examples of DD with nchansols >= 2. The ones I sent were indeed nchansols=1 (but was very nice to see they looked ok). In the pipeline are only use nchansols > 1 as a standard for the DI steps.

Joshuaalbert commented 6 years ago

This is the result with `NChanSols=1`.

Some originally good directions like Dir_31 are now very different. screenshot from 2018-07-18 11-49-30

Joshuaalbert commented 6 years ago

Amplitude plot

screenshot from 2018-07-18 11-57-34

Joshuaalbert commented 6 years ago

@cyriltasse can you do a little rooting around in wirtinger to see where the banding comes from? @rvweeren and I read through the code, and conclude that all the chans are being iterated over, and a possible source of weirdness is in how you propagate sigP or sigQ?? It's not clear at all though, and that's just a guess.

rvweeren commented 6 years ago

Once I have a node available I will do a quick --NChanSols 4 test to check if there is banding here as well.

Joshuaalbert commented 6 years ago

We now look at the difference between the two solves. By difference we mean the Itoh difference which is, D[phi_1, phi_2] = W[W[phi_1] - W[phi_2]] which gives a proper measure of wrapped phase difference.

Difference between`NChanSols=1` and `NChanSols=2`

Despite the obvious banding, there is a flip of pi for direction 31.

screenshot from 2018-07-18 13-40-24

If you furthermore restrict to only the non-banded channels

The different should be near zero.

screenshot from 2018-07-18 13-44-59

cyriltasse commented 6 years ago

Could we have a quick chat? I'm losing track of what you're doing...

rvweeren commented 6 years ago

Same problems show up with MergeSols.py and PlotSolsIm.py. Example for direction 5, note the banding in the --NChanSols 2 (bottom) plot.

figure_dir7 figure_dir7_nchansol2

rvweeren commented 6 years ago

@cyriltasse, here's the data in Leiden

/net/rijn/data2/rvweeren/LOFARHBA_A665/DR2Josh/

cyrilP126+65BEAM_1_chan40-60.ms cyrilP126+65BEAM_1_chan40-50.ms cyrilP126+65BEAM_1_chan50-60.ms

Example (running from the above mentioned directory):

kMS.py --MSName cyrilP126+65BEAM_1_chan50-60.ms --SolverType KAFCA --PolMode Scalar --BaseImageName image_full_ampphase_di_m.NS --dt 0.133333 --NIterKF 6 --CovQ 0.1 --LambdaKF=0.5 --NCPU 32 --OutSolsName DDchan1 --NChanSols 1 --PowerSmooth=1.0 --InCol DATA --Weighting Natural --UVMinMax=0.100000,1000.000000 --SolsDir=SOLSDIRtest --BeamMode LOFAR --LOFARBeamMode=A --DDFCacheDir=. --NodesFile image_dirin_SSD_m.npy.ClusterCat.npy --DicoModel image_full_ampphase_di_m.NS.DicoModel

mhardcastle / ddf-pipeline