kwazzi-jack / kal-cal

Kalman Filter and Smoother Implementation for Radio Interferometric Gains Calibration. This library is part of the master's work by Brian Welman and serves as a 'proof-of-concept' tool for it.
MIT License
1 stars 1 forks source link

Flux Suppression Simulations #8

Open kwazzi-jack opened 2 years ago

kwazzi-jack commented 2 years ago

Hi all, so I set up a script to convert model arrays to fits images. I had not even gotten to recovered flux arrays when I spotted the issue.

Kat7 "True" model data - 2 sources Looks fine and probably why it did not look so bad in our results. kat7-src-2

Meerkat "True" model data - 2 sources One of the sources isn't even in the image, let alone the fact that one is not at the centre. meerkat-src-2

VLA-B "True" model data - 2 sources Both sources are definitely not in the correct place. vlab-src-2

This is only the true model image made from the skymodels. Obviously the gridding from the skymodels is not working, so instead of showing the recovered flux fits images and trying to fix this conversion problem, I think I will just follow what you said @landmanbester and create the array myself with no use of skymodels. That way I can ensure all sources are accounted for and I can fix the image size, plus ensure no overlapping sources. Am I mistaken with any of this?

**Edit: I've yet to put the skymodels over the image, but I don't know if that is possible with radio-padre and still trying to get X11 to port to my desktop for tigger.

landmanbester commented 2 years ago

Clearly something is going very wrong. Try divorcing from sky models and just placing pixels on a grid as suggested. Then make the dirty and model images and check that they look reasonable. How are you selecting your cell sizes? You need to make sure that your pixels are small enough to satisfy Nyquist

o-smirnov commented 2 years ago

Yeah something is very wrong with the simulation then. Also, I suggest making images oversampled in the usual way (~5 pixels across the PSF) so that the sources are more obvious.

kwazzi-jack commented 2 years ago

So I've been somewhat successful at switching to generating a 2D model array (being careful not to put sources near the edge) and then using the same cell-size and pixel-size selection process we were using for the fluxtractor script (I.e. satisfies nyquist and used good_size function). Everything seems to be working well, using dirty2ms instead. The new data columns themselves seem great as I can image with wsclean to get a decent image returned, but I seem to be having an issue with ms2dirty. It takes the same parameters as I would when I used dirty2ms. Here are the resulting image plots and flux per source after using ms2dirty.

The "Before" model is the one I construct before feeding into dirty2ms and the rest are the corresponding data columns in my ms fed into ms2dirty. Also, this is all where 100% of the sources is modelled. For 1 and 2 sources case, the auto selection of cell- and pixel-size seems to be fine for all arrays. I see for some reason that it does not work for the 100 source case. The model image before fits all sources but the images after do not fit all of them. They use the same cell-size and pixel-size so I'm not sure what could be wrong here.

KAT7 - 1 source kat7-src-1-fp-100-img

KAT7 - 2 sources kat7-src-2-fp-100-img

VLA-B - 1 source vlab-src-1-fp-100-img

VLA-B - 2 sources vlab-src-2-fp-100-img

VLA-B - 100 sources vlab-src-100-fp-100-img

MeerKAT - 1 source meerkat-src-1-fp-100-img

MeerKAT - 2 sources meerkat-src-2-fp-100-img

MeerKAT - 100 sources meerkat-src-100-fp-100-img

landmanbester commented 2 years ago

Just be careful with your terminology. Model images are in units of Jy/pixel, dirty images are in units of Jy/beam only after you divide them by the sum of the weights (usually called wsum). So none of the images you show are in the correct units.

As for the 100 source dirty images not making sense, are you just looking up pixel values at the corresponding sources or are you actually running the flux extraction script on them?

kwazzi-jack commented 2 years ago

I think that might be the issue here then. I have not run it through the fluxtractor script yet. But I think I will do so and see what happens. Also do I need to pass a mask when I create my data columns from ms2dirty?

landmanbester commented 2 years ago

Yeah that will make a difference but I am not sure if that is all that is wrong because you still expect to get roughly the correct flux at the locations of the brightest sources (once you divide by wsum). The mask in ms2dirty is actually the negation of the flags not the image space mask that you use for source extraction. You don't have to worry about it unless you have flagged data

kwazzi-jack commented 2 years ago

So I ran the pipeline on the new data simulation. I decided to write a rough outline of my pipeline so you can see if there are any issues and to also help fill in what I've been doing.

Parameters

Here is the params governing the pipeline:

Simulation

Simulation is done as followed:

Calibration

Calibration algorithms used are:

Imaging

Perform single pixel per source imaging using:

Plotting results

We plot the following results:

Results

The MSE on the gains seems appropriate as before:

1.1 Gains MSE: kalcal over process noise (against quartical), single source

gains-mse-prim-kalcal-sec-quartical-all-src-1

1.2 Gains MSE: quartical over time intervals (against kalcal), single source

gains-mse-prim-quartical-sec-kalcal-all-src-1

2.1 Gains MSE: kalcal over process noise (against quartical), two sources

gains-mse-prim-kalcal-sec-quartical-all-src-2

2.2 Gains MSE: quartical over time intervals (against kalcal), two sources

gains-mse-prim-quartical-sec-kalcal-all-src-2

3.1 Gains MSE: kalcal over process noise (against quartical), 100 sources with MeerKAT

gains-mse-prim-kalcal-sec-quartical-meerkat-src-100

3.2 Gains MSE: quartical over time intervals (against kalcal), 100 sources with MeerKAT

gains-mse-prim-quartical-sec-kalcal-meerkat-src-100

4.1 Gains MSE: kalcal over process noise (against quartical), 100 sources with VLA-B

gains-mse-prim-kalcal-sec-quartical-vlab-src-100

4.2 Gains MSE: quartical over time intervals (against kalcal), 100 sources with VLA-B

gains-mse-prim-quartical-sec-kalcal-vlab-src-100

5.1 Flux Ratio: kalcal vs quartical, one source

flux-ratio-prim-kalcal-sec-quartical-all-src-1

5.2 Flux Ratio: kalcal vs quartical, two sources

flux-ratio-prim-kalcal-sec-quartical-all-src-2

6.1 Average % Error: kalcal vs quartical, 100 sources with MeerKAT

flux-avg-percent-err-prim-kalcal-sec-quartical-meerkat-src-100

6.2 Average % Error: kalcal vs quartical, 100 sources with VLA-B

flux-avg-percent-err-prim-kalcal-sec-quartical-vlab-src-100

7.1 Weighted average: kalcal vs quartical, 100 sources with MeerKAT

flux-weight-avg-prim-kalcal-sec-quartical-meerkat-src-100

7.2 Weighted average: kalcal vs quartical, 100 sources with VLA-B

flux-weight-avg-prim-kalcal-sec-quartical-vlab-src-100

Notes

kwazzi-jack commented 2 years ago

This is the same issue as before, sadly and I am not quite sure what the to do. I decided to check out the result after switching quartical for cubical. It would be known to be worse than quartical, and it was set to mimic quartical as close as possible. Only thing I did forgot about was setting weights for cubical to the noise value of the data, which I'm not sure how much it would impact the results? Anyway, here is the resulting plots, similar to those above:

1.1 Gains MSE: kalcal over process noise (against cubical), single source

gains-mse-prim-kalcal-sec-cubical-all-src-1

1.2 Gains MSE: cubical over time intervals (against kalcal), single source

gains-mse-prim-cubical-sec-kalcal-all-src-1

2.1 Gains MSE: kalcal over process noise (against cubical), two sources

gains-mse-prim-kalcal-sec-cubical-all-src-2

2.2 Gains MSE: cubical over time intervals (against kalcal), two sources

gains-mse-prim-cubical-sec-kalcal-all-src-2

3.1 Gains MSE: kalcal over process noise (against cubical), 100 sources with MeerKAT

gains-mse-prim-kalcal-sec-cubical-meerkat-src-100

3.2 Gains MSE: cubical over time intervals (against kalcal), 100 sources with MeerKAT

gains-mse-prim-cubical-sec-kalcal-meerkat-src-100

4.1 Gains MSE: kalcal over process noise (against cubical), 100 sources with VLA-B

gains-mse-prim-kalcal-sec-cubical-vlab-src-100

4.2 Gains MSE: cubical over time intervals (against kalcal), 100 sources with VLA-B

gains-mse-prim-cubical-sec-kalcal-vlab-src-100

5.1 Flux Ratio: kalcal vs cubical, one source

flux-ratio-prim-cubical-sec-kalcal-all-src-1

5.2 Flux Ratio: kalcal vs cubical, two sources

flux-ratio-prim-cubical-sec-kalcal-all-src-2

6.1 Average % Error: kalcal vs cubical, 100 sources with MeerKAT

flux-avg-percent-err-prim-cubical-sec-kalcal-meerkat-src-100

6.2 Average % Error: kalcal vs cubical, 100 sources with VLA-B

flux-avg-percent-err-prim-cubical-sec-kalcal-vlab-src-100

7.1 Weighted average: kalcal vs cubical, 100 sources with MeerKAT

flux-weight-avg-prim-cubical-sec-kalcal-meerkat-src-100

7.2 Weighted average: kalcal vs cubical, 100 sources with VLA-B

flux-weight-avg-prim-cubical-sec-kalcal-vlab-src-100

Not too sure what to make of this either, but it does seem to show the avg. % error to worsening as time interval decreases, unlike with quartical. On a side note, is using a single correlation not made the problem less complex, resulting in solutions that are far easier to reach and much closer to the true ones?

Let me know if you want to see the fits images as well of the results.

kwazzi-jack commented 2 years ago

Found a single bug through imaging the data columns with the 2 source case where I had overwritten some visibility values reducing it something similar to the single source case. The 1 source and 100 source case should be fine.

landmanbester commented 2 years ago

Wow that's a lot of information to take in. At least things are making more sense now. The simulation you describe seems sensible. I am surprised by how big the difference between CubiCal and QuartiCal is. They should be pretty much equivalent. I guess Jon but some magic sauce in the QuartiCal recipe. Some things to check: i) Are you interpolating the qcal gain solutions to full resolution before doing the flux extraction or are you applying a sum of boxcars? If the former it may explain why qcal is doing so well on the flux ratio plots (5.1 and 5.2). The CubiCal plots look more like I expect. ii) Why are the kalcal results so different for the average % error between the two runs (6.1. and 6.2)? Are you not using the same parameters for kalcal between the two runs? The results for the second run look more like I expect. iii) I don't think the weighted sum is a good metric to show in your thesis because it does not provide information on individual sources but rather some kind of global average. I think this can be misleading. You can put this in your thesis as a motivation for looking at the average % error instead

kwazzi-jack commented 2 years ago

Well with regards to the differences between quartical and cubical, that is where I am not sure if they are "exactly" the same based on the configuration files.

i) Are you interpolating the qcal gain solutions to full resolution before doing the flux extraction or are you applying a sum of boxcars? If the former it may explain why qcal is doing so well on the flux ratio plots (5.1 and 5.2). The CubiCal plots look more like I expect.

I am applying as a sum of boxcars, by extending the solutions as constant over the specified time-interval index in time. I do this for both quartical and cubical.

ii) Why are the kalcal results so different for the average % error between the two runs (6.1. and 6.2)? Are you not using the same parameters for kalcal between the two runs? The results for the second run look more like I expect.

They use the same parameters. The only difference would be the arrays used (so VLA-B or MeerKAT). I'm not sure what is happening for sigma_f < 1e-5 in 6.1 with MeerKAT. The only other notable differences would be cell-size, image-size and simulated data.

iii) I don't think the weighted sum is a good metric to show in your thesis because it does not provide information on individual sources but rather some kind of global average. I think this can be misleading. You can put this in your thesis as a motivation for looking at the average % error instead

I am aware of this as we discussed before. I'm using it as a test metric only. I understand how the weighted average should look and so if it deviates from this then I know there is an issue and most probably it is affecting the other results as well.

landmanbester commented 2 years ago

I think the consensus is that something must still be going wrong. I am not sure what it is but I can suggest a few things to try:

kwazzi-jack commented 2 years ago
  • Set P_0 to Q and initialise your solution at t0 using the maximum likelihood solution (you can use that time only maximum likelihood calibration script I gave you a while ago to get the solution only at time 0).

With the maximum likelihood script, are we solving for that specific time-step or just over the whole time-axis and then selecting the solution for the first time-step as m_0 for the filter?

  • Write more detailed tests for your JHr and JHJ operators. How are you testing them atm? I know we tested J with the seemingly linear model but what about JHr? If you are using the sparse matrix representation you can simply use .conj() and .transpose() to get it explicitly. Is that what you are doing? If not you need to check that JH is consistent with J. Do you know how to do that?

Currently, I am using the explicit implementation for the calibration as it is faster than the sparse implementation, so that shouldn't be an issue. I haven't done detailed tests on the operators. I don't particularly know what I could be testing JHr and JHJ against in this sense, let alone if JH is consistent with J (I am using JH = J.conjugate().transpose()).

  • For your sanity checks, simulate some gains with non-unity mean functions. That should result in much worse flux suppression if you do the flux extraction using only unity gains.
  • Make your phase variations far more extreme than your amplitude variations and confirm that things get much worse in this case.

I'm not quite sure what you meant by mean functions in this case. What I have tried to do is to separate gains creation by:

  1. Simulate gains for amplitude on large time length-scales,
  2. Simulate gains for phase on small time length-scales,
  3. Form the full-complex gains by AMPLITUDE_GAINS.real np.exp(1.0j PHASE_GAINS)

I haven't done any other of the changes you suggested yet (like setting P_0 to Q) to just test this first. I tested out unity gains vs true gains imaging with the fluxtractor on the new simulated data columns, and it seems to be now be quite severe if we use unity gains, with flux ratio of sources ranging from 1.2 to 1.8.

I found the optimal solutions for kalcal and quartical, and plotted them in a gains-product plot for VLA-B with 100 sources (100% modelled). Here you can see what I try to achieve with my new gains, and how quartical and kalcal behave as we have been hoping for. Kalcal is able to track a lot better despite the erratic data, whereas quartical has to average out to reduce the overfitting of noise. A final point to note is that sigma_f = 0.1 and tint = 16 here.

gp-quartical-vs-kalcal

The only issue is that the resulting pipeline plots are now far harder to understand, and quartical still seems to be doing well despite the terrible solutions it produces. Let me know which plot you would like to see.

I would like a meeting to go through all of this, possibly tomorrow?

kwazzi-jack commented 2 years ago

I'm busy running the kalcal side where m_0 is known and P_0 is Q. I made prelim plots to see the behaviour between the original m_0 = 1.0 + 0.0j, P_0 = 1.0 start compared to our new one. All of these are VLA-B on 100 sources, with 100% modelled visibilities and various sigma_f.

1.1 Original priors, Sigma_f = 0.1 (ideal sigma_f)

gp-vlab-src-100-sf-0 1-normal-start

1.2 New priors, Sigma_f = 0.1 (ideal sigma_f)

gp-vlab-src-100-sf-0 1-ml-start

2.1 Original priors, Sigma_f = 0.00001 (sigma_f really small, lots of averaging)

gp-vlab-src-100-sf-0 00001-normal-start

2.2 New priors, Sigma_f = 0.00001 (sigma_f really small, lots of averaging)

gp-vlab-src-100-sf-0 00001-ml-start

3.1 Original priors, Sigma_f = 10.0 (sigma_f really large, fits noise more, can't track as well since steps are too large)

gp-vlab-src-100-sf-10 0-normal-start

3.2 New priors, Sigma_f = 10.0 (sigma_f really large, fits noise more, can't track as well since steps are too large)

gp-vlab-src-100-sf-10 0-ml-start

landmanbester commented 2 years ago

Ok, this is what I expected to see. With the original prior your starting estimate for the prior covariance was totally over-estimated and doesn't have enough steps to contract when using a large value for sigmaf. I think the new priors are more sensible. Can you check what the posterior covariance looks like now? I remember they were totally over-estimated. I wonder if the prior could have been the cause of that. Also, maybe the hyper-parameter formulae work better with the new prior. We won't have time to check all of that now but it's definitely on the list for your PhD

kwazzi-jack commented 2 years ago

Results from the chi-sq on residual visibilities:

w x (Vpq - gp x Mpq x gq^+)^H x (Vpq - gp x Mpq x gq^+)

where w = 1/(2 x sigma_n^2) and sigma_n = sqrt(2)

1. When gains are set to true-gains

Array # of Sources % Modelled Flux Chi-Sq
kat7 1 100 15226.2
kat7 2 100 15134.2
kat7 2 70 17733.1
meerkat 1 100 1.45186e+06
meerkat 2 100 1.45198e+06
meerkat 2 70 1.75963e+06
meerkat 100 100 1.45072e+06
meerkat 100 90 1.48772e+06
meerkat 100 80 1.55824e+06
meerkat 100 70 1.67628e+06
vlab 1 100 251985
vlab 2 100 252323
vlab 2 70 300224
vlab 100 100 253325
vlab 100 90 258168
vlab 100 80 266088
vlab 100 70 278133

2. When gains are set to kalcal with sigma_f = 0.1

Array # of Sources % Modelled Flux Chi-Sq
kat7 1 100 14699
kat7 2 100 14639.7
kat7 2 70 16092
meerkat 1 100 1.44566e+06
meerkat 2 100 1.44639e+06
meerkat 2 70 2.64533e+06
meerkat 100 100 1.4484e+06
meerkat 100 90 1.48315e+06
meerkat 100 80 1.54872e+06
meerkat 100 70 1.68022e+06
vlab 1 100 249203
vlab 2 100 249412
vlab 2 70 425598
vlab 100 100 250845
vlab 100 90 254977
vlab 100 80 262016
vlab 100 70 272525

3. When gains are set to quartical with tint = 16

Array # of Sources % Modelled Flux Chi-Sq
kat7 1 100 15545.1
kat7 2 100 15630.5
kat7 2 70 16447.2
meerkat 1 100 1.51988e+06
meerkat 2 100 1.53598e+06
meerkat 2 70 1.8162e+06
meerkat 100 100 1.63426e+06
meerkat 100 90 1.66772e+06
meerkat 100 80 1.73098e+06
meerkat 100 70 1.83863e+06
vlab 1 100 266645
vlab 2 100 264034
vlab 2 70 302914
vlab 100 100 278359
vlab 100 90 282050
vlab 100 80 288977
vlab 100 70 299301

Not sure what to make of the numbers.

kwazzi-jack commented 2 years ago

Let me know if you want me to post the reduced chi-squared rather.

landmanbester commented 2 years ago

Can you just compute np.vdot(res, res) to make sure nothing is going wrong in your chisq formula. Also, compare to what you get from doing this with QuartiCal. That should tell us if something is going wrong

landmanbester commented 2 years ago

The chisq values look reasonable, they are increasing with decreasing modelled flux as expected. Once you have confirmed that you get the same as qcal can you plot the chisq as a function of solution interval for one of the 100% modelled flux simulations and check if the minimum coincides with the minimum of the MSE?

kwazzi-jack commented 2 years ago

Here is with np.vdot(res, res).

1. Noise-free Visibilities, with true-gains

Array # of Sources % Modelled Flux Chi-Sq
kat7 1 100 9.73429e-11
kat7 2 100 1.2532e-10
kat7 2 70 9983.55
meerkat 1 100 1.16988e-08
meerkat 2 100 1.48559e-08
meerkat 2 70 1.23031e+06
meerkat 100 100 3.24956e-08
meerkat 100 90 148786
meerkat 100 80 431005
meerkat 100 70 902145
vlab 1 100 3.37272e-10
vlab 2 100 2.29302e-09
vlab 2 70 192015
vlab 100 100 4.35415e-09
vlab 100 90 19478.3
vlab 100 80 51001.8
vlab 100 70 99211.8

2. Noisy Visibilities, with true-gains

Array # of Sources % Modelled Flux Chi-Sq
kat7 1 100 60904.7
kat7 2 100 60536.9
kat7 2 70 70932.4
meerkat 1 100 5.80743e+06
meerkat 2 100 5.80794e+06
meerkat 2 70 7.03852e+06
meerkat 100 100 5.8029e+06
meerkat 100 90 5.95088e+06
meerkat 100 80 6.23298e+06
meerkat 100 70 6.70512e+06
vlab 1 100 1.00794e+06
vlab 2 100 1.00929e+06
vlab 2 70 1.2009e+06
vlab 100 100 1.0133e+06
vlab 100 90 1.03267e+06
vlab 100 80 1.06435e+06
vlab 100 70 1.11253e+06

3. Noisy Visibitilies, with kalcal-gains

Array # of Sources % Modelled Flux Chi-Sq
kat7 1 100 58796.2
kat7 2 100 58559
kat7 2 70 64367.9
meerkat 1 100 5.78264e+06
meerkat 2 100 5.78555e+06
meerkat 2 70 1.05813e+07
meerkat 100 100 5.7936e+06
meerkat 100 90 5.9326e+06
meerkat 100 80 6.1949e+06
meerkat 100 70 6.72089e+06
vlab 1 100 996813
vlab 2 100 997646
vlab 2 70 1.70239e+06
vlab 100 100 1.00338e+06
vlab 100 90 1.01991e+06
vlab 100 80 1.04806e+06
vlab 100 70 1.0901e+06

4. Noisy Visibilities, with quartical-gains

Array # of Sources % Modelled Flux Chi-Sq
kat7 1 100 62180.5
kat7 2 100 62521.9
kat7 2 70 65788.9
meerkat 1 100 6.07953e+06
meerkat 2 100 6.14392e+06
meerkat 2 70 7.26482e+06
meerkat 100 100 6.53704e+06
meerkat 100 90 6.67087e+06
meerkat 100 80 6.92393e+06
meerkat 100 70 7.35454e+06
vlab 1 100 1.06658e+06
vlab 2 100 1.05614e+06
vlab 2 70 1.21166e+06
vlab 100 100 1.11344e+06
vlab 100 90 1.1282e+06
vlab 100 80 1.15591e+06
vlab 100 70 1.1972e+06
landmanbester commented 2 years ago

Looks like kalcal is doing better at the level of the visibilities as well so it still doesn't make sense that qcal should do better with flux suppression. I really want to see how the chisq compares when computing it with residuals written out by qcal. Have you looked at that?

kwazzi-jack commented 2 years ago

Okay, that does bring a bit of relief. Here is with Quartical's residual product only:

Array # of Sources % Modelled Flux tint-1 tint-4 tint-8 tint-16 tint-32 tint-64
kat7 1 100 40174.5 56554.4 59759.6 62180.5 63669.2 65147.5
kat7 2 100 40312.6 56352 59803.4 62521.9 64103.6 65899.5
kat7 2 70 43325.9 59575.6 63108.7 65788.9 67345.1 69049.6
meerkat 1 100 5.62304e+06 5.80754e+06 5.92706e+06 6.07953e+06 6.2185e+06 6.3873e+06
meerkat 2 100 5.62459e+06 5.81812e+06 5.9571e+06 6.14392e+06 6.30207e+06 6.5162e+06
meerkat 2 70 6.79127e+06 6.97803e+06 7.10142e+06 7.26482e+06 7.40097e+06 7.58521e+06
meerkat 100 100 5.61982e+06 5.88091e+06 6.16587e+06 6.53704e+06 6.90957e+06 7.39515e+06
meerkat 100 90 5.75912e+06 6.01911e+06 6.30226e+06 6.67087e+06 7.04123e+06 7.52233e+06
meerkat 100 80 6.02102e+06 6.27984e+06 6.55968e+06 6.92393e+06 7.29123e+06 7.76842e+06
meerkat 100 70 6.46727e+06 6.7236e+06 6.99803e+06 7.35454e+06 7.71673e+06 8.18398e+06
vlab 1 100 930409 998386 1.02976e+06 1.06658e+06 1.09611e+06 1.13019e+06
vlab 2 100 931919 998640 1.0255e+06 1.05614e+06 1.08163e+06 1.11554e+06
vlab 2 70 1.0902e+06 1.1568e+06 1.18265e+06 1.21166e+06 1.23714e+06 1.27369e+06
vlab 100 100 935598 1.01095e+06 1.05619e+06 1.11344e+06 1.16822e+06 1.23948e+06
vlab 100 90 952387 1.02781e+06 1.07243e+06 1.1282e+06 1.18413e+06 1.25478e+06
vlab 100 80 980112 1.0564e+06 1.10074e+06 1.15591e+06 1.21247e+06 1.28243e+06
vlab 100 70 1.02253e+06 1.09757e+06 1.14208e+06 1.1972e+06 1.25403e+06 1.32463e+06
kwazzi-jack commented 2 years ago

Does not seem right that tint-1 seems far less compared to tint larger than it. So if I am correct, this means there is an issue with the simulated data?

kwazzi-jack commented 2 years ago

Also, the column for tint-16 in the previous table matches this table. So they are producing the same outcome.

JSKenyon commented 2 years ago

Also, the column for tint-16 in the previous table matches this table. So they are producing the same outcome.

That is comforting.

Does not seem right that tint-1 seems far less compared to tint larger than it. So if I am correct, this means there is an issue with the simulated data?

It makes sense that it is low - you have a large number of degrees of freedom. You can and will over-fit to some extent. I would suggest running the tint-2 and tint-3 cases just to be sure. Again, plotting the tint-1 gains on the same plot as the tint-16 and KalCal gains may be illuminating.

kwazzi-jack commented 2 years ago

Again, plotting the tint-1 gains on the same plot as the tint-16 and KalCal gains may be illuminating.

gp-quartical-compare

EDIT: Complex line = True gains line**

kwazzi-jack commented 2 years ago

I imaged the residual columns using wsclean:

1. Residuals - VLA-B @ 100 sources, with quartical (tint = 1)

wsclean-vlab-tint-1-residual

2. Residuals - VLA-B @ 100 sources, with quartical (tint = 16)

wsclean-vlab-tint-16-residual

kwazzi-jack commented 2 years ago

3. Noise-free corrupted visibilities - VLA-B @ 100 sources

wsclean-vlab-true

JSKenyon commented 2 years ago

Ok, that is good. To me it is pretty l

Again, plotting the tint-1 gains on the same plot as the tint-16 and KalCal gains may be illuminating.

gp-quartical-compare

EDIT: Complex line = True gains line**

This is good but seriously suggests that something is wrong with the gain MSE plots for QuartiCal. By eye, tint-1 is way closer to the truth than any of the others. I would actually expect a slightly higher tint, between 2 and 6, will probably fit even better as it will overfit slightly less.

The residual images are not informative without a locked colour scale/axes.

Edit: This is progress though!

kwazzi-jack commented 2 years ago

So to increase overfitting, I should increase noise, which in turn would raise the optimal tint to something a few steps higher?

JSKenyon commented 2 years ago

The tint-1 solution is already overfitting. What I am saying is that it is surprising that you find an optimal tint of 16 - it is clearly underfitting. Try producing the gains product plots with tint-1 through tint-8 using QuartiCal. You should clearly see it move from overfitting to underfitting and if you don't then we have to consider a bug in QuartiCal.

Edit: Ultimately, I think that the QuartiCal solutions look as you would expect. The issue was that the flux suppression plots didn't seem consistent with the gains. If the gains look right, that suggests the problem is elsewhere.

kwazzi-jack commented 2 years ago

I think that is what is happening if I had to plot tint-1 to tint-8. The MSE plots probably are just like you said that its just which ever happens to be generally closer. That is why I would think its either got something to do with the simulated data which I am going to image using corrected-data from quartical to see the effect of tint-1 and up. I'm also going to have a look at simulating a dataset with meqtrees to see if things change.

kwazzi-jack commented 2 years ago

I've setup a single branch of my pipeline as a Jupyter notebook as suggested. It goes from start with creating the ms to a few plots. You can find it on tina or /net/tina/home/welman/masters/projects/flux_suppression/debug/. In there is the notebook called flux_suppression.ipynb. There is a virtual-environment in /net/tina/home/welman/masters/projects/flux_suppression/ called kal-env that runs the notebook server, and has all the necessary libraries to run.

Please could you have a look as it is still producing similar results that make no sense. Thank you!

kwazzi-jack commented 2 years ago

So here are results from the smaller pipeline. Running on MeerKAT with 100 sources, peak-flux=0.1 Jy and epsilon ~ CN(0, 8). Calibration was done on 100%, 90%, 80%, and 70% modelled sources. Here are the results:

1. Mean Square Error of Gains

gains-mse

2. Weighted Average Absolute Percentage Error

flux-wape

3. Flux Ratio on first 3 brightest sources

flux-ratio

landmanbester commented 2 years ago

The kalcal plots are the same regardless of modelled flux percentage. Are you sure you didn't mess something up?

kwazzi-jack commented 2 years ago

Well that's what I don't get. The models are definitely different and hence different model-visibilties.

In the dirty2ms step, the the input model changes depending on the % modeled sources.

landmanbester commented 2 years ago

There is definitely something weird going on. How can the minimum MSE for qcal be at a solution interval close to 50? Have you plotted it to see what that looks like?

kwazzi-jack commented 2 years ago

Quoting @landmanbester from Google chat:

If you sum the amplitudes in vis space it'll [% unmodeled source flux] be 30% off, not sure about phases.

So I did this to double-check if it was actually translating through when I generated my model visibilities with dirty2ms. All I did was calculate the amplitude for the 100% model visibilities, and then compared it to the 90% to 70% cases.

amp_100 = np.sum(np.abs(model_vis_100))
amp_90 = np.sum(np.abs(model_vis_90))
amp_80 = np.sum(np.abs(model_vis_80))
amp_70 = np.sum(np.abs(model_vis_70))

print("100% ->", amp_100 / amp_100 * 100, "%")
print("90% ->", amp_90 / amp_100 * 100, "%")
print("80% ->", amp_80 / amp_100 * 100, "%")
print("70% ->", amp_70 / amp_100 * 100, "%")

Output:

100% -> 100.0 %
90% -> 99.6713399887085 %
80% -> 99.0151047706604 %
70% -> 98.065584897995 %

So, based on this, that would mean there is no % modeled flux decrease, and this could be explaining the plots not changing in the end?

landmanbester commented 2 years ago

Yep, that's not right. Can you just try by taking fraction*model where fraction is the fraction of modelled flux? Also, I tried running your notebook again and noticed that kal-env is python2.7. Please update the venv to be > python3.7

kwazzi-jack commented 2 years ago

So I figured that already and had run with what you mentioned with fraction * model. The outcome did not seem to change much, I will post plots soon. I am confused as to what you were looking at with kal-env because:

welman@tina:~/masters/projects/flux_suppression$ source kal-env/bin/activate
(kal-env) welman@tina:~/masters/projects/flux_suppression$ python --version
Python 3.9.4

I'm busy compiling all the plots, and plots you asked to see before. Will post soon.

kwazzi-jack commented 2 years ago

Quoting @landmanbester from Google chat:

If you sum the amplitudes in vis space it'll [% unmodeled source flux] be 30% off, not sure about phases.

So I did this to double-check if it was actually translating through when I generated my model visibilities with dirty2ms. All I did was calculate the amplitude for the 100% model visibilities, and then compared it to the 90% to 70% cases.

amp_100 = np.sum(np.abs(model_vis_100))
amp_90 = np.sum(np.abs(model_vis_90))
amp_80 = np.sum(np.abs(model_vis_80))
amp_70 = np.sum(np.abs(model_vis_70))

print("100% ->", amp_100 / amp_100 * 100, "%")
print("90% ->", amp_90 / amp_100 * 100, "%")
print("80% ->", amp_80 / amp_100 * 100, "%")
print("70% ->", amp_70 / amp_100 * 100, "%")

Output:

100% -> 100.0 %
90% -> 99.6713399887085 %
80% -> 99.0151047706604 %
70% -> 98.065584897995 %

So, based on this, that would mean there is no % modeled flux decrease, and this could be explaining the plots not changing in the end?

So with this now, I re-calculated it with the fraction * (100% model) change and I get the expected:

Successful readonly open of default-locked table meerkat.ms: 41 columns, 1451520 rows
100% -> 100.0 %
90% -> 90.0000274181366 %
80% -> 80.00003099441528 %
70% -> 70.00002264976501 %
landmanbester commented 2 years ago

So I figured that already and had run with what you mentioned with fraction * model. The outcome did not seem to change much, I will post plots soon. I am confused as to what you were looking at with kal-env because:

welman@tina:~/masters/projects/flux_suppression$ source kal-env/bin/activate
(kal-env) welman@tina:~/masters/projects/flux_suppression$ python --version
Python 3.9.4

I'm busy compiling all the plots, and plots you asked to see before. Will post soon.

Even me

(kal-env) ╭─bester@oates /net/tina/home/welman/masters/projects/flux_suppression
╰─➤  python --version
Python 2.7.17

But I am not sure if this is even supposed to work. I sourced your venv while logged in from a different machine

kwazzi-jack commented 2 years ago

And if you run python3 --version instead? It could be hooking the main python link to python2 instead.

kwazzi-jack commented 2 years ago

Here is what I get with python2:

(kal-env) welman@tina:~/masters/projects/flux_suppression$ python2 --version
Python 2.7.17
landmanbester commented 2 years ago
(kal-env) ╭─bester@oates /net/tina/home/welman/masters/projects/flux_suppression
╰─➤  python3 --version
Python 3.6.9

I suspect the venv is not really activated somehow

kwazzi-jack commented 2 years ago

Possibly because of personal profile links. No worries, you can make your own if you like. You just need the following:

cd /net/tina/home/welman/masters/projects/flux_suppression/
virtualenv -p python3.9 lb-env
source lb-env/bin/activate
pip install -e ../../packages/kal-cal ../../packages/pfb-clean
pip install quartical tqdm PyYaml

I think that should be it. Let me know if there are any errors with dependencies.

kwazzi-jack commented 2 years ago

So I've run it with the modified fraction * model with the corrections mentioned during our meeting. Here I set peak-flux = 1.0 Jy, noise ~ CN(0.0, 2) and I've extended the upperbound on the process-noise search from 0.1 to 1.0. Here are the plots from the notebook:

1. MSE on calibrated gains, quartical vs kalcal

gains-mse

2. Flux Ratio for first 3 brightest sources, quartical vs kalcal

flux-ratio

3. Flux Weighted Absolute Percentage Error, quartical vs kalcal

flux-wape

4. Gains-product plot, comparing time-interval solutions for quartical

quartical-gp-compare-tint

kwazzi-jack commented 2 years ago

I reran the simulation, to test the outcome of unmodelled sources, reducing noise. Here I set peak-flux = 1.0 Jy, noise ~ CN(0.0, 0.02). Here are the plots from the notebook:

  1. MSE on calibrated gains, quartical vs kalcal gains-mse

  2. Flux Ratio for first 3 brightest sources, quartical vs kalcal flux-ratio

  3. Flux Weighted Absolute Percentage Error, quartical vs kalcal flux-wape

  4. Gains-product plot, comparing time-interval solutions for quartical quartical-gp-compare-tint

  5. Gains-product plot, comparing process-noise solutions for kalcal kalcal-gp-compare-sigma_f

We can see the standard drift of the gains solutions in the gains-product plots. Plus, it seems to be appearing in the flux-ratio plots for kalcal.

Still not sure what is happening with quartical. At least it is showing that when there is little noise, the smaller time-intervals will perform better, correlating with MSE and Weight Absolute Percentage Error plots too.

To add, each time I rerun the simulation, all generate items (ms, gains, fluxes, etc.) are deleted and jupyter-notebook outputs are cleared.

kwazzi-jack commented 2 years ago

I reran the simulation, to test the outcome of unmodelled sources, reducing noise. Here I set peak-flux = 1.0 Jy, noise ~ CN(0.0, 0.02). Here are the plots from the notebook:

1. MSE on calibrated gains, quartical vs kalcal
   ![gains-mse](https://user-images.githubusercontent.com/20638068/168057922-73463aa5-6c97-44eb-a679-4d9ac6ac458d.png)

2. Flux Ratio for first 3 brightest sources, quartical vs kalcal
   ![flux-ratio](https://user-images.githubusercontent.com/20638068/168057959-1f990152-5e85-455d-84d1-96f2a68b04f2.png)

3. Flux Weighted Absolute Percentage Error, quartical vs kalcal
   ![flux-wape](https://user-images.githubusercontent.com/20638068/168058002-07099a32-56c6-4f94-99e3-537bdbac23a8.png)

4. Gains-product plot, comparing time-interval solutions for quartical
   ![quartical-gp-compare-tint](https://user-images.githubusercontent.com/20638068/168060897-43459bb4-4977-49f8-9918-3651a6a2a5d7.png)

5. Gains-product plot, comparing process-noise solutions for kalcal
   ![kalcal-gp-compare-sigma_f](https://user-images.githubusercontent.com/20638068/168061456-81a009d1-0e7a-4708-8f90-bd8edfc49800.png)

We can see the standard drift of the gains solutions in the gains-product plots. Plus, it seems to be appearing in the flux-ratio plots for kalcal.

Still not sure what is happening with quartical. At least it is showing that when there is little noise, the smaller time-intervals will perform better, correlating with MSE and Weight Absolute Percentage Error plots too.

To add, each time I rerun the simulation, all generate items (ms, gains, fluxes, etc.) are deleted and jupyter-notebook outputs are cleared.

So I've just noticed with the kalcal solutions that I did not set the sigma_n correctly for its calibration run, so I'm redoing these plots and will edit with the correct plots when its done. quartical should be the same, roughly.

kwazzi-jack commented 2 years ago

I have a question @landmanbester for this section in the imaging script:

...
# V = Jp int I kpq dl dm/n Jq.H
# V = G R mask x   G = Mueller term,  G = Jp Jq.H,  G.H G = Jq Jq.H Jp.H Jp
G = corrupt_vis(tbin_indices, tbin_counts, ANT1, ANT2,
                gains, np.ones_like(vis[..., None, None]))[:, :, 0]

# Sigma Inverse and Weights
S = 1/sigma_n**2
W = (S * G.conj() * G).real

# Sum of Weights
wsum = sum(W)

# x = (R.H G.H G R)inv R.H G.H V
dirty = ms2dirty(uvw=UVW, freq=FREQ, ms=S * G.conj() * vis,
                 npix_x=nx, npix_y=ny,
                 pixsize_x=cell_rad, pixsize_y=cell_rad,
                 epsilon=tol, nthreads=nthreads, do_wstacking=True)

def hess(x):
    tmp = dirty2ms(uvw=UVW, freq=FREQ, dirty=mask * x,
                   pixsize_x=cell_rad, pixsize_y=cell_rad,
                   epsilon=tol, nthreads=nthreads, do_wstacking=True)

    res = ms2dirty(uvw=UVW, freq=FREQ, ms=tmp, wgt=W,
                   npix_x=nx, npix_y=ny,
                   pixsize_x=cell_rad, pixsize_y=cell_rad,
                   epsilon=tol, nthreads=nthreads, do_wstacking=True)

    return mask * res

# Run pcg function to get recovered image
model_rec = pcg(hess, mask * dirty, x0=np.ones_like(dirty), tol=tol)
...

So I realized, that I am not applying wsum to the output image. In particular, would it not have to be coupled with the ms2dirty functions? I.e.

dirty /= wsum

and:

return mask * res / wsum

Another question is how the gradient descent is being done. So the initial x0 is a dirty image, that has the conjugate G divided by sigma_n column squared applied as weights to visibilities, i.e. S * G.conj() <=> |g_p^* x g_q| / sigma_n^2. This is to represent the G^H V part of the expression?

So why does the hessian function then not do this in the ms2dirty step, but only applies W, which is W = S * G.conj() * G <=> |g_p^* x g_q|^2 / sigma_n^2?