Calculate better reffreq for nterms=2 images

psheehan commented 6 months ago

When there are uneven distributions of SPW bandwidths (e.g. one wide window at the extremes with a collection of narrow windows), nterms=2 imaging can degrade the quality of the image when it uses its default SPW-center-average as the reference frequency for the Taylor expansion. This PR fixes that by calculating the per-SPW total weights of the unflagged data, and using those total weights to do a weighted average of the SPW centers. This way, wider-bandwidth SPWs get more weight naturally, but it also accounts for cases where the noise in wide bandwidth SPWs degrades due to poor atmospheric transmission.

@jjtobin - I've tested this on the few problematic cases that led to this ticket, but do you think it would be worth running tests on a few normal datasets before merging? I forget how many of our regression tests end up with nterms = 2, but may be worth running a round of that?

jjtobin commented 6 months ago

So the verdict on PR40 is that I sometimes like it, and sometimes don't. The largest differences are at low frequencies, where I think what could be happening is that despite the sum of the weights being lower, there is a lot of amplitude in the edge spws and the shift of the reffreq can cause issues. I see the largest decrement in m-L-Q_standalone_selfcal_PR40... versus the same one in the tests_v1.5; but m-S-to-A, and M82 also show drops in their S/N.

So, we might consider making this an ALMA-only thing or at least only something that is used with non-uniform/non-contiguous spectral configurations for VLA (which would require a couple extra helper functions to check on possibly.

psheehan commented 6 months ago

Remind me - how do the weights get set for VLA data? Is it statwt? I haven't looked at m-S-to-A or M82, but in m-L-Q the most significant/concerning change was for the L-band observations. I took a look at the sum of weights for each SPW and there's a huge range that doesn't necessarily have a pattern:

CASA <21>: np.array(weights)/max(weights)
Out[21]: 
array([0.27339041, 0.08677045, 0.05032375, 0.03123116, 0.19412216,
       0.8965507 , 1.        , 0.93821213, 0.        , 0.04529565,
       0.66470916, 0.71035158, 0.87999871, 0.91706853, 0.94047366,
       0.41297236])

A number of the windows have very low weights, but its not necessarily obvious to me why that would be the case. It seems at least that flagging can explain some of it, but not necessarily all:

CASA <31>: nflags / (nflags + nunflagged) Out[31]: array([0.25383698, 0.293065 , 0.64590348, 0.77787463, 0.58799804, 0.23758256, 0.18842107, 0.19937948, 1. , 0.89353498, 0.29706224, 0.23526316, 0.13052345, 0.12826044, 0.15386504, 0.26169575])

In (properly calibrated) ALMA data, I think all of the weights with a common channel width should be the same, modulo the Tsys calibration. So the sum of the weights should more or less be related to the total bandwidth, amount of flagging that has been done in that SPW, and the Tsys calibration. I'm wondering whether this breaks down once you run statwt, which to some degree removes the theoretical underpinnings of this?

In which case... we could possibly limit this to ALMA? This is also making me wonder - would we be better off following PIPE-1838 and just doing the bandwidth weighted mean frequency since Ryan has presumably tested that better and leave further research of this to CASR-716 (even if it ends up being us doing that)? It's only one dataset that runs into the atmospheric transmission issue at this point... so it might make sense to take the more well tested fix for the time being?

psheehan commented 6 months ago

Continuing to look into this, I'm now running a test where the reffreq for the VLA is calculated using the (bandwidth)*(fraction of unflagged data) as the weights for the weighted average. I'll see how this does on the m-L-Q dataset and then go from there.

psheehan commented 6 months ago

Another thought I just had - examining the different options that I've tried a little more closely, the peak SNR changes from reffreq to reffreq, but the RMS does not. I think this actually makes sense. The problem with ALMA was that the RMS changed dramatically for different reffreqs (likely due to the weird configurations with lots of narrow spws, etc.), hence why picking a better reffreq did a better job. For the VLA, the RMS doesn't change much because the spws all have about the same bandwidth. But with the wide bandwidths we're much more likely to have non-zero spectral indices, and so the peak flux is going to depend a lot more on the reffreq that is chosen.

That's to say, I'm not sure the changing SNR is necessarily a "bad" thing because we are just making an image at a different place in the spectrum. The RMS doesn't change, which was the main symptom of ALMA data.

With that being said... assuming that statwt is used for the VLA weights rather than a more theoretical underpinning, I might still be inclined to do something like use (bandwidths)*(fraction of unflagged data) as the weighting factor for VLA data.

Thoughts?

jjtobin commented 6 months ago

You're correct that the VLA is using statwt to calculate the weights rather than Tsys as ALMA does. The thing affecting the weights more in some spws than others is most likely RFI, those spws also presumably have more flagging than other spws as well. But the RFI is likely causing lots of scatter in the visibilities leading to reduced weights relative to spws that might not have as much RFI.

You could be correct that maybe it's not an issue given a non-zero spectral index and the RMS not changing. For some of these sources the spectral index is going to be negative, so the reffreq going to a higher frequency would cause the peak in tt0 to go down, so maybe this is not a problem as we think about this more deeply.

jjtobin commented 5 months ago

A minor concern that I have with the bandwidth weighting as implemented is that it does not properly take into account the loss of bandwidth due to regions flagged out by the cont.dat. The loss of bandwidth from the cont.dat selection will get masked a bit when doing the spectral averaging (i.e., the flagging fractions will not catch all of this) and will then only be reflected in the weights for a particular spw. I think we should ideally be making use to of the more detailed 'effective' bandwidth, which you should be able to get from the get_spw_bandwidth(vis,spwsarray_dict,target,vislist) function or just get_spw_eff_bandwidths.

psheehan commented 5 months ago

Ah, that's a good point. Since we flag the lines, though, do you think that would be covered by the fraction of unflagged data? I.e., if we used the fraction of unflagged data and the effective bandwidth, would be be double counting?

jjtobin commented 5 months ago

I don't think it's double counting, I am thinking that it's not counting enough. If we were looking at the full resolution MSes, I think the native bandwidth + flagging fraction would be fine. However, since the flagging is on a per channel basis, when we do channel averaging, the number of flagged channels should be less, so I think the flagging stats in the spectrally-averaged MS will not accurately reflect the loss of sensitvity. So I think we need to use both the effective bandwidths and flagging fractions.

psheehan commented 5 months ago

I'll look into it some - the number of flagged channels goes down... but so does the total number of channels. So the fraction of flagged channels should stay (more or less) the same?

psheehan commented 5 months ago

Looking into the question of effective bandwidth, and turns out one of the test datasets for this is I think a good example. It must have a lot of line emission, because these are the effective bandwidths/bandwidths:

{25: {'effective_bandwidth': 0.6406657048, 'bandwidth': 1.875},
 27: {'effective_bandwidth': 0.8281776819999891, 'bandwidth': 1.875},
 29: {'effective_bandwidth': 0.562536043199998, 'bandwidth': 1.875},
 31: {'effective_bandwidth': 0.7031701170000133, 'bandwidth': 1.875}}

Or, in terms of unflagged fraction:

[0.3416883758933333, 0.4416947637333275, 0.3000192230399989, 0.3750240624000071]

If I look at the flags in the _targets.selfcal.ms file, I get an unflagged fraction of:

[0.24380569, 0.31516346, 0.21407329, 0.25955955]

Consistent with the line flagging plus a little extra, on the order of 10%. And if I look at the _targets.ms file without any line flagging, I see an unflagged fraction of

array([0.7430673 , 0.74289553, 0.76319591, 0.72122622])

So ~25% of the data is flagged prior to flagging lines. That's more than the 10% extra, but some of that 25% prior to line flagging probably overlaps with the line region (if its evenly distributed across channels, we'd expect 50-70% of it to overlap, leaving ~10-12% additional flagging on top of line flags). If I then flag lines in _targets.ms and repeat the unflagged fraction calculation, I get

array([0.25080603, 0.32511892, 0.22260395, 0.26444961])

which is in fairly good agreement from what comes out of the _targets.selfcal.ms file.

So it seems like the flags from the data should take care of the effective bandwidth?

jjtobin commented 5 months ago

I suspect that at least some of the 25% flagged is probably edge channel flagging in each spw.

But yes, it does look like the flagging might mostly take care of the lost continuum bandwidth. I would sort of prefer using the effective bandwidths +unflagged fraction, but it might not be worth the effort to test enough data to find where a substantive difference will occur. However, even the effective bandwidth has its fault since that would effectively assume a more narrow bandwidth at the central frequency when in reality it would be more complicated. So I guess we should try to do too much.

It looks like for the criteria if the reffreq is different by the width of the largest spw for ALMA you will change to the sumwt method. I think that is fine.

psheehan commented 5 months ago

Yes, flagging at the edge is probably some of that 25%, though I forget how it is done for FDM windows...

For what its worth, either way we are assuming a more narrow bandwidth at the central frequency, because that is the easiest thing to do - I don't actually get the unflagged channels. So either way has the same drawback.

I'd be a bit hesitant to do both effective bandwidth + unflagged fraction, because separating the unflagged fraction from the "good" continuum is a bit challenging, and as it is done now there's risk for significant overlap. In the above example it wouldn't be a huge deal because its fairly even across the spws... but if one spw had more continuum, then the others would be downweighted by (fraction)^2 rather than fraction (i.e. more heavily).

For now I put it in to be the bandwith weighting for everything, with the option to go to sumwt in the event that the shift is significant (significant defined as the width of the maximum width spw, though thoughts on that very welcome). Per our discussion yesterday, I figure we can test this on some VLA datasets to get a sense of how default vs. sumwt vs. bandwidth impacts VLA data, and then we can decide whether we want to adjust for the VLA?

jjtobin / auto_selfcal

Calculate better reffreq for nterms=2 images #40