Closed rvweeren closed 4 years ago
This was the used command for the solve (done for each 2 MHz block)
kMS.py --MSName my.ms --SolverType KAFCA --PolMode Scalar --BaseImageName image_full_ampphase_di_m.NS --dt 0.133333 --NIterKF 6 --CovQ 0.1 --LambdaKF=0.5 --NCPU 60 --OutSolsName DDjosh --NChanSols 2 --PowerSmooth=1.0 --InCol DATA --Weighting Natural --UVMinMax=0.100000,1000.000000 --SolsDir=SOLSDIRjosh --BeamMode LOFAR --LOFARBeamMode=A --DDFCacheDir=. --NodesFile image_dirin_SSD_m.npy.ClusterCat.npy --DicoModel image_full_ampphase_di_m.NS.DicoModel
We have done a full phase+amp solve on a dataset with kMS then went back and used this skymodel as input for a phase only solve in kMS. This produces a set of phases (without imposing any tec model). The clock was taken out before. We do the phase only solve on split MSs of 2 1MHz blocks each.
I'm doing a bayesian tec screen fit on raw phase solutions. NDPPP only spits out phases after imposing a tec+csp model which biases the model. Ideally, the input would be completely independent phases in time and freq. Note: the use of Kalman filter to generate these phases means there is time dependence imposed, however the fact that it's an optimal linear gaussian estimator means the errors should be unbiased if a proper state transition was used.
Note: Two schools of thought exist. Impose the Bayesian model during the actual solve, or do it after on the gains. They are not equivalent, but under some assumptions can be similar. The fact that we see the below banding in the low-brightness directions shows that imposing the bayesian model from the start would have higher power than post-processing the gains.
For the brightest direction things line up resaonably well, which is encouraging
Things look nice for the brightest direction.
An immediately noticeable problem is that there appears to be some banding. The MSs were solved independently with bandwidths per MS of 2MHz blocked in two 1MHz intervals. There appears to be an odd banding issue here. Why does every other channel have zero gains? Suppose we would have split into twice as many MSs with 1MHz each. Would this also occur then? Note that there does appear to be some useful infomration here (slight gradients noticeable). The Kalman filter has imposed some temporal smoothness.
Does the usage of option --NChanSols 2
mean the frequency channels in solve are not independent? Perhaps transition model in kalman filter imposes something here?
This fan like pattern must be related to transit timescales. Perhaps the beam? Perhaps a wrong sky model?
Possible to show a few more directions? Also might be nice to see the field and the facet layout (perhaps those are online somewhere?) as perhaps e.g. the faintest facet is diverging or something.
Also for the phase plots whats the colour scale? Does it wrap around?
For the phase plots they are hsv, wrapped, -pi to +pi. @rvweeren can post the facet layout, but yes this only happens in the facets with almost no flux. Most important is to understand why this frequency banding occurs though. Even in the low brightness directions every other freq has enough signal to produce something.
This is the layout. The banding problem occurs for the facets mostly at the edge.
So the philosophy of the KF is in principle to give a prior evolution file that is TEC fitted. These off axis facets would be smoothed to zero-phase basically, and the output would be more or less the same. You just don't have much signal there, so it can be healthy. So my first simple question - can you start from TEC-fitted values to get the tec-screen?
Then on the amp structured, it could be an unmodeled source or could be real as well - hard to know. What's really strange to me is the 1/2 patern on all directions - I've never seen that before, is that a plotting problem?...
I doubt it (1/2 patern) is a plotting issue. The plotting commands are the same (apart from the direction index) and 1/2 banding is much less strong for facets near the center which have plenty of flux (we tried more than one and it really correlates with the location in the field and amount of flux)
Josh make sure you phase reference to some fixed station
My suspicion is that if I use --NChanSols 1 and split the observations in 50 blocks of 1 MHz it will disappear.... (instead of 25 blocks of 2 MHz and --NChanSols 2). Cyril, I could try that if you think this would be a useful test?
@cyriltasse Yes, I can go from tec solved independently per direction and derive a tec screen (more accuractely it's a tec + cs phase as NDPPP produces). That is something I've already done and which we want to avoid now, because such a scheme imposes biases into the final phase screen. It essentially is only valid at the central frequency, and is just plain wrong in many cases (we can show that the chi2 basin of TEC in low signal regime doesn't even have a global minimum in the right place). Another reason to go straight from phases is because my method is invariant to phase wrapping, so it can properly handle wild ionospheres.
@rvweeren Yes they are all referenced to ant0. Yes, for facets with lots of flux the odd-even banding is less prevalent, however the problem may still be there but less noticeable.
@rvweeren @cyriltasse before simply using --NChanSols 1 and split the observations in 50 blocks of 1 MHz it would be good to understand where this is coming from so that we can rule out that there is not something implicitly wrong. Even in the low signal regime there should still be noisy answers and not entirely flagged channels.
just for completeness, NChanSols
is the number of solutions on the frequency axis, not the step...
Perhaps worth plotting this for a few more fields to see if you always see these strange structures.
@rvweeren can you split the MS into 1MHz bins and rerun with Nchansols=1. That way we can continue with my algorithm testing, and at the same time see what happens in the edge case of one block one solve.
Yes, I have it running with 1 MHz blocks and --NChanSols 1
I think important to check that we dont see this behavior on other fields too though because we use NChanSols 2 as a standard and Cyril said he hadnt seen this type of thing before.
Is it possible that the number of data-channels varies from sols freq-bin to freq-bin?
I think that there are an even number (10sbs) in one of these msfiles so id have thought that NChanSols would put 5 in each sols freq-bin.
You could try plotting these ones /disks/paradata/shimwell/LoTSS-DR1/other-pointings/testing/ongoing-May18runs/L232875_NKF6/L232875_newrestor/SOLSDIR/*/killMS.DDS3_full.sols.npz
Double checked, there are always freq 20 channels per block (or 10 in my new run).
We can also try on others datasets but that will take more time (we need to get the clocks out for the TEC screen fitting)
Oh I was thinking purely for the plotting to look at the weird solution issue (even the clock isnt removed it would be nice to see that solutions are not weird).
@twshimwell, yes that will be much quicker/easier to test. Will do that on lofar2 as soon as the--NChanSols 1
solve is done.
I'll plot your sols @twshimwell just to check. Can you give me write permission in that folder so that I can export the sols to hdf5, merge then plot? I'll delete them after to get back the storage.
thanks. ive tried to give permissions to /disks/paradata/shimwell/LoTSS-DR1/other-pointings/testing/ongoing-May18runs/L232875_NKF6/L232875_newrestor/SOLSDIR
If it doesnt work just copy SOLSDIR, its not so big.
That data set is a little weird. When merging the ms's I get that there are a mismatch in axes:
INFO: Sorting output axes...
len pol: 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 - Will be: 4
len dir: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 45 1 1 1 45 1 1 1 45 1 - Will be: 45
len ant: 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 - Will be: 62
len freq: 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 1 2 2 2 1 2 - Will be: 79
len time: 596 716 716 596 596 511 511 511 447 447 447 447 447 447 480 358 358 326 480 596 596 511 480 511 - Will be: 3434
My expectation was that each of those ms's contained a solution for the same axes except for the frequency axis.
Indeed, the mismatch in axes means that the time axis is getting stacked instead of concatenating down the freq axis. Also 99.1% of the phase values are nan's. Do you have another dataset? For example, if I plot the time-freq phase for direction 0, and the first 596 timeslices:
You are looking at just the DDS3_full sols right? There should be one file in each of the SOLSDIR/msname/ directories.
Yes I exported the .ms/full.sols.npz to hdf5, then merged those. (Oh I just realised there are several solsets inside. sol000 - sol002.)
Ah you just want .ms/DDS3full.sols.npz
The wildcard you pasted would have picked up other solutions (the DIfull ones) that have just one direction and varying time averaging.
Sent from my iPhone
On 14 Jul 2018, at 14:34, Joshua George Albert notifications@github.com wrote:
Yes I exported the .ms/full.sols.npz to hdf5, then merged those.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
There are still time axis mismatches, so this results in an incorrect merge. Unless all axes are the same except one axis, and concatenation happens down that axis, the merge will fail. Also the losoto h5parm_collector script needs to sort the freq axe as right now it relies on the order of input files. There are things I can fix and get back to you. But I would also request a consistent dataset to plot.
For the moment though, here is a simple inspection. Each subband exported solution, as I mentioned above, has solsets sol000, sol001, and sol002, each with phase000 soltabs. The len of the frequency axis respectively is 2, 1, and 1. Therefore, let's look at the sol000/phase000 for subband '166MHz': Both freq blocks are there and similar colors. That's nice. However there is only 1 direction in sol000/phase000 so I can't look at other directions.
@twshimwell Ahh, okay I'll use better wildcard.
Oh weird as the DDS3full ones should have been made with the same dt in the solve.
Ok well maybe these ones instead /disks/paradata/shimwell/LoTSS-DR1/other-pointings/testing/ongoing-May18runs/L639769/SOLSDIR//DDS3_full.sols*npz
Oh just saw you used the new wildcard :) slow typing on my phone
Thanks for plotting these Josh. They look quite sensible then which is very good to see.
@twshimwell, but I only see about 25 solutions along the freq. axis, so this is not a --NChanSols 2
solve?
@rvweeren Here is the plot of josh50block for the blocks that have finished so far.
This looks much more normal. What is it about the edge case NChanSols=2
? What does NChanSols=3
do?
Note that the y-axis is not simply freq here (since the solve is only partly done, there are still gaps). However, I think we can already conclude the banding is gone. (I can try --NChanSols 3, so we get 3 solutions along the freq. axis per ms, but it will take 2 more days or so before the current solve is done)
Yes ^^ however, the blocks are mostly completing in order so they are close to being ordered. @rvweeren For reference this is the first one (with banding).
Ah sorry it seems i dont have examples of DD with nchansols >= 2. The ones I sent were indeed nchansols=1 (but was very nice to see they looked ok). In the pipeline are only use nchansols > 1 as a standard for the DI steps.
NChanSols=1
.Some originally good directions like Dir_31
are now very different.
@cyriltasse can you do a little rooting around in wirtinger to see where the banding comes from? @rvweeren and I read through the code, and conclude that all the chans are being iterated over, and a possible source of weirdness is in how you propagate sigP or sigQ?? It's not clear at all though, and that's just a guess.
Once I have a node available I will do a quick --NChanSols 4 test to check if there is banding here as well.
We now look at the difference between the two solves. By difference we mean the Itoh difference which is, D[phi_1, phi_2] = W[W[phi_1] - W[phi_2]]
which gives a proper measure of wrapped phase difference.
NChanSols=1
and NChanSols=2
Despite the obvious banding, there is a flip of pi
for direction 31.
The different should be near zero.
Could we have a quick chat? I'm losing track of what you're doing...
Same problems show up with MergeSols.py and PlotSolsIm.py. Example for direction 5, note the banding in the --NChanSols 2 (bottom) plot.
@cyriltasse, here's the data in Leiden
/net/rijn/data2/rvweeren/LOFARHBA_A665/DR2Josh/
cyrilP126+65BEAM_1_chan40-60.ms cyrilP126+65BEAM_1_chan40-50.ms cyrilP126+65BEAM_1_chan50-60.ms
Example (running from the above mentioned directory):
kMS.py --MSName cyrilP126+65BEAM_1_chan50-60.ms --SolverType KAFCA --PolMode Scalar --BaseImageName image_full_ampphase_di_m.NS --dt 0.133333 --NIterKF 6 --CovQ 0.1 --LambdaKF=0.5 --NCPU 32 --OutSolsName DDchan1 --NChanSols 1 --PowerSmooth=1.0 --InCol DATA --Weighting Natural --UVMinMax=0.100000,1000.000000 --SolsDir=SOLSDIRtest --BeamMode LOFAR --LOFARBeamMode=A --DDFCacheDir=. --NodesFile image_dirin_SSD_m.npy.ClusterCat.npy --DicoModel image_full_ampphase_di_m.NS.DicoModel
New issue page to post results from Josh's phase screen work.
Varying clocks were removed beforehand.