Add DI gains during imaging

landmanbester commented 3 years ago

As explained in this overleaf doc, "correcting" the data by applying the inverse of the gains is not statistically correct if the gains have non-unity amplitudes. As also shown there, this can easily (at least in the DI diagonal case) be overcome by including the gains in the definition of the dirty image and PSF. The easiest way to do this during imaging is to write the product of gp*gq.conj() to a separate column in the MS so I had a go at it here. For the time being, the script just loads in separate K, G and B tables and then linearly interpolates (and extrapolates to fill in missing data) the gains to the times and frequencies corresponding to a supplied ms. Now I plan do add some SNR dependent smoothing here eventually but I thought I better test that I am reading the tables correctly first.

To test the script, I reduced some data with caracal and wrote out both the original and corrected data for the calibrator field at full resolution. Then, if the script works as expected, we should recover the original data by corrupting the corrected data with the interpolated gains. I can recover the visibility amplitudes to within machine precision in regions where we have valid data (no extrapolation). However, the phases are quite far off (not a phase wrap) and I am not sure what is going wrong so I thought I would open the discussion here. I have assumed that the K term is parametrised as

K = exp(i delay(t) nu)

where delay is taken from the 'FPARAM' column in the K table. Anybody know if this is correct? Also, what does the number in the ANTENNA2 column in the gain tables represent? Is that the reference antenna or something? Tagging @o-smirnov @JSKenyon @bennahugo

JSKenyon commented 3 years ago

You might be a victim of the delays not being denormalized before being written. This would cause the resulting values to be way off if you used the wrong formula. See this issue on CubiCal.

JSKenyon commented 3 years ago

If you are talking about casa tables, I am not sure though.

landmanbester commented 3 years ago

Ah ok, thanks. But does casa also solve for an offset (i.e. c in your diagram)? If so where/how is it stored. I was under the impression that the FPARAM table contains the slope for both correlations. Also, what is v1 and v0 in your diagram?

JSKenyon commented 3 years ago

v0 is the smallest frequency, v1 is the largest. Bear in mind that issue pertains to CubiCal delays specifically.

I am not 100% sure if casa solves for a peculiar phase in addition to the delay. Ben will likely have a better idea. Check the values - delays will likely be < 1e-9. If the numbers are larger than it might be a unit thing.

landmanbester commented 3 years ago

If you are talking about casa tables, I am not sure though.

Sigh, they are. I would reproduce them with cubical if I knew how to set up the caracal config files to do that. Not sure how difficult that would be though

JSKenyon commented 3 years ago

You can send me the table and I might be able to be a little more helpful. The delay estimate does have to be w.r.t a reference antenna.

JSKenyon commented 3 years ago

By the look of it, only a delay is in the table, almost certainly using the expression you gave above. The issue, I suspect, is units. Values in the cal table are too large to be in seconds. Struggling to find it in the docs by I would not be surprised if they were values in nanoseconds.

landmanbester commented 3 years ago

Hmmm, scaling the units to nanoseconds get's me a lot closer but still not identical (10-20% differences). I don't think it is an issue with the reference antenna because, at least for the diagonal terms, it should cancel out in the product gp*gq.conj(). I see differences at all four correlations (in phases only). I must be missing something. Is PA rotation turned off by default? I doubt this could explain the 10-20% differences in the phases on the diagonals but its worth a shot...

landmanbester commented 3 years ago

Ok, its not that. This is what applycal.last contains

taskname           = "applycal"  
vis                =  "/stimela_mount/msdir/1557433849_sdp_l0-cal.ms" 
field              =  ""
spw                =  ""
intent             =  ""                                                                                                                                                                                                                     selectdata         =  True
timerange          =  ""
uvrange            =  ""
antenna            =  ""
scan               =  ""
observation        =  ""
msselect           =  ""
docallib           =  False
callib             =  ""
gaintable          =  ['/stimela_mount/output/e137-1557433849_sdp_l0-1gc1_primary.G1', '/stimela_mount/output/e137-1557433849_sdp_l0-1gc1_primary.B1', '/stimela_mount/output/e137-1557433849_sdp_l0-1gc1_primary.K1']
gainfield          =  ['nearest', 'nearest', 'nearest']
interp             =  ['nearest', 'linear', 'nearest']
spwmap             =  []
calwt              =  [False]
parang             =  False
applymode          =  "calflag"
flagbackup         =  False
#applycal(vis="/stimela_mount/msdir/1557433849_sdp_l0-cal.ms",field="",spw="",intent="",selectdata=True,timerange="",uvrange="",antenna="",scan="",observation="",msselect="",docallib=False,callib="",gaintable=['/stimela_mount/output/e137-1557433849_sdp_l0-1gc1_primary.G1', '/stimela_mount/output/e137-1557433849_sdp_l0-1gc1_primary.B1', '/stimela_mount/output/e137-1557433849_sdp_l0-1gc1_primary.K1'],gainfield=['nearest', 'nearest', 'nearest'],interp=['nearest', 'linear', 'nearest'],spwmap=[],calwt=[False],parang=False,applymode="calflag",flagbackup=False)

Maybe it has something to do with my interpolation strategy. Still doesn't explain why it only affects the phases though

landmanbester commented 3 years ago

Changing the sign on the exponent of the K term gets me oh so bloody close. Phase differences between DATA and UNCORRECTED_DATA are now sitting anywhere between 1e-4 and 5e-2. This is probably good enough for government work, especially since I am just trying to interpolate the 1GC gains onto the target field. Would still be great to get to the bottom of this though. Suggestions welcome. I have some questions:

Does applymode ="calflag" mean apply calibration solutions and flags?
Also, why do we set interp = ['nearest', 'linear', 'nearest']? Doesn't that mean that the G and K solutions will be will be snapped to the closest available calibrator scan and doesn't it make more sense to interpolate across calibrator scans?

o-smirnov commented 3 years ago

According to https://casa.nrao.edu/docs/taskref/applycal-task.html:

’’=’calflag’ (default) calibrate data and apply flags from solutions

You interpret the meaning of interp correctly, but applycal.last is not about the target. Note the name of the MS -- it's the MS where the calibrator scans are. Caracal only uses applycal to correct calibrator data (in which case nearest is appropriate). To correct targets, it uses the mstransform task in "OTF cal" mode -- check the logs to see what the settings are there, but it ought to be linear for all cases...

landmanbester commented 3 years ago

Alright, thanks @o-smirnov. I finally found the log entry and I am not sure that is what it's doing but I might be reading it wrong. Here is the mstransform log for one of the measurement sets (the other uses the same options)

# 2021-01-06 00:01:45   INFO    mstransform::::+        ##########################################
# 2021-01-06 00:01:45   INFO    mstransform::::+        ##### Begin Task: mstransform        #####
# 2021-01-06 00:01:45   INFO    mstransform:::: mstransform(vis="/stimela_mount/input/1557433849_sdp_l0.ms",outputvis="/stimela_mount/msdir/1557433849_sdp_l0-ESO137_001-corr.ms",createmms=False,separationaxis="auto",numsubms="auto",
# 2021-01-06 00:01:45   INFO    mstransform::::+                tileshape=[0],field="ESO137-001",spw="",scan="",antenna="",
# 2021-01-06 00:01:45   INFO    mstransform::::+                correlation="",timerange="",intent="",array="",uvrange="",
# 2021-01-06 00:01:45   INFO    mstransform::::+                observation="",feed="",datacolumn="corrected",realmodelcol=False,keepflags=True,
# 2021-01-06 00:01:45   INFO    mstransform::::+                usewtspectrum=True,combinespws=False,chanaverage=False,chanbin=1,hanning=False,
# 2021-01-06 00:01:45   INFO    mstransform::::+                regridms=False,mode="channel",nchan=-1,start=0,width=1,
# 2021-01-06 00:01:45   INFO    mstransform::::+                nspw=1,interpolation="linear",phasecenter="",restfreq="",outframe="",
# 2021-01-06 00:01:45   INFO    mstransform::::+                veltype="radio",preaverage=False,timeaverage=False,timebin="",timespan="",
# 2021-01-06 00:01:45   INFO    mstransform::::+                maxuvwdistance=0.0,docallib=True,callib="/stimela_mount/output/caltables/callib-e137-1557433849_sdp_l0-1gc1-transform__target.txt",douvcontsub=False,fitspw="",
# 2021-01-06 00:01:45   INFO    mstransform::::+                fitorder=0,want_cont=False,denoising_lib=True,nthreads=1,niter=1,
# 2021-01-06 00:01:45   INFO    mstransform::::+                disableparallel=False,ddistart=-1,taql="",monolithic_processing=False,reindex=True)
# 2021-01-06 00:01:45   INFO    mstransform:::: Parse docallib parameters
# 2021-01-06 00:01:45   INFO    MSTransformManager::parseMsSpecParams   Input file name is /stimela_mount/input/1557433849_sdp_l0.ms
# 2021-01-06 00:01:45   INFO    MSTransformManager::parseMsSpecParams   Data column is CORRECTED
# 2021-01-06 00:01:45   INFO    MSTransformManager::parseMsSpecParams   Output file name is /stimela_mount/msdir/1557433849_sdp_l0-ESO137_001-corr.ms
# 2021-01-06 00:01:45   INFO    MSTransformManager::parseMsSpecParams   Re-index is enabled
# 2021-01-06 00:01:45   INFO    MSTransformManager::parseMsSpecParams   WEIGHT_SPECTRUM will be written in output MS
# 2021-01-06 00:01:45   INFO    MSTransformManager::parseMsSpecParams   Tile shape is [0]
# 2021-01-06 00:01:45   INFO    MSTransformManager::parseDataSelParams  field selection is ESO137-001
# 2021-01-06 00:01:45   INFO    MSTransformManager::parseCalParams      Calibration is activated
# 2021-01-06 00:01:45   INFO    MSTransformManager::colCheckInfo        Adding DATA column to output MS from input virtual CORRECTED_DATA column
# 2021-01-06 00:01:45   INFO    MSTransformManager::initDataSelectionParams     Selected Fields Ids are [1]
# 2021-01-06 00:01:45   INFO    MSTransformManager::open        Select data
# 2021-01-06 00:01:45   INFO    MSTransformManager::createOutputMSStructure     Create output MS structure
# 2021-01-06 00:01:55   INFO    MSTransformDataHandler::makeSelection   5954759 out of 6741415 rows are going to be considered due to the selection criteria.
# 2021-01-06 00:01:56   INFO    MSTransformManager::checkFillWeightSpectrum     Optional column WEIGHT_SPECTRUM found in input MS will be written to output MS
# 2021-01-06 00:01:56   INFO    MSTransformManager::generateIterator    OTF calibration activated, using calibration record spec to generate iterator
# 2021-01-06 00:02:01   INFO    Calibrater::    Arranging to calibrate MS: /stimela_mount/input/1557433849_sdp_l0.ms
# 2021-01-06 00:02:01   INFO    OldCalibrater::setcallib2(callib)       Arranging to APPLY:
# 2021-01-06 00:02:01   INFO            .   B Jones: table=/stimela_mount/output/caltables/e137-1557433849_sdp_l0-1gc1_primary.B1 (by cal library) calWt=false
# 2021-01-06 00:02:02   INFO            .   0:
# 2021-01-06 00:02:02   INFO    +            MS: obs= fld=ESO137-001 intent= spw=
# 2021-01-06 00:02:02   INFO    +            CT: tinterp=linear finterp=linear
# 2021-01-06 00:02:02   INFO    +                obsmap=[]         fldmap=[0, 0]
# 2021-01-06 00:02:02   INFO    +                spwmap=[]         antmap=[]
# 2021-01-06 00:02:02   INFO    OldCalibrater::setcallib2(callib)       Arranging to APPLY:
# 2021-01-06 00:02:02   INFO            .   G Jones: table=/stimela_mount/output/caltables/e137-1557433849_sdp_l0-1gc1_primary.G1 (by cal library) calWt=false
# 2021-01-06 00:02:03   INFO            .   0:
# 2021-01-06 00:02:03   INFO    +            MS: obs= fld=ESO137-001 intent= spw=
# 2021-01-06 00:02:03   INFO    +            CT: tinterp=nearest finterp=linear
# 2021-01-06 00:02:03   INFO    +                obsmap=[]         fldmap=[0, 0]
# 2021-01-06 00:02:03   INFO    +                spwmap=[]         antmap=[]
# 2021-01-06 00:02:03   INFO    OldCalibrater::setcallib2(callib)       Arranging to APPLY:
# 2021-01-06 00:02:03   INFO            .   K Jones: table=/stimela_mount/output/caltables/e137-1557433849_sdp_l0-1gc1_primary.K1 (by cal library) calWt=false
# 2021-01-06 00:02:04   INFO            .   0:
# 2021-01-06 00:02:04   INFO    +            MS: obs= fld=ESO137-001 intent= spw=
# 2021-01-06 00:02:04   INFO    +            CT: tinterp=nearest finterp=linear
# 2021-01-06 00:02:04   INFO    +                obsmap=[]         fldmap=[0, 0]
# 2021-01-06 00:02:04   INFO    +                spwmap=[]         antmap=[]
# 2021-01-06 00:02:05   INFO    OldCalibrater::setcallib2(callib)       The following calibration terms are arranged for apply:
# 2021-01-06 00:02:05   INFO    OldCalibrater::setcallib2(callib)       .   B Jones: table=/stimela_mount/output/caltables/e137-1557433849_sdp_l0-1gc1_primary.B1 (by cal library) calWt=false
# 2021-01-06 00:02:05   INFO    OldCalibrater::setcallib2(callib)       .   K Jones: table=/stimela_mount/output/caltables/e137-1557433849_sdp_l0-1gc1_primary.K1 (by cal library) calWt=false
# 2021-01-06 00:02:05   INFO    OldCalibrater::setcallib2(callib)       .   G Jones: table=/stimela_mount/output/caltables/e137-1557433849_sdp_l0-1gc1_primary.G1 (by cal library) calWt=false
# 2021-01-06 00:02:05   INFO    mstransform:::: Apply the transformations
# 2021-01-06 03:54:31   INFO    mstransform:::: CASA Version 5.6.1-8
# 2021-01-06 03:54:31   INFO    mstransform::::
# 2021-01-06 03:54:31   INFO    mstransform:::: ##### End Task: mstransform          #####
# 2021-01-06 03:54:31   INFO    mstransform::::+        ##########################################

So finterp is always linear but tinterp is set to nearest for both K and G. Does that mean what I think it does? I can't seem to find settings in the caracal parset to control this. Any ideas?

o-smirnov commented 3 years ago

Ouch! Now that I think is a Caracal bug. Well spotted!

landmanbester commented 3 years ago

It seems that the interpolation keyword in mstransform only applies to the freq axis. From the docs

interpolation |   | Spectral interpolation method.
-- | -- | --
  |   | allowed: | string
  |   | Default: | linear

Let me test what gets me closer to the original target data i.e. transferring gains using nearest or linear interpolation.

o-smirnov commented 3 years ago

Well, Caracal uses OTF mode with callibs (https://casa.nrao.edu/casadocs/casa-5.4.1/uv-manipulation/on-the-fly-calibration), and looking at the callibs, they all say tinterp=nearest. Hold my beer, I've got two dozen-odd pipelines to rerun...

landmanbester commented 3 years ago

Oh shit, ouch! Well selfcal should have removed some of it but changing that should allow you to go deeper in the first pass. I see many ,docallib=False,callib="" in my caracal log, none actually have entries. I assume it uses some default settings. Where do I find and modify these callib files?

landmanbester commented 3 years ago

Ah, nvm, I found them

o-smirnov commented 3 years ago

The callibs are auto-generated (under output/caltables), so I'll have to fix some code to get them to be generated right.

I see many ,docallib=False,callib="" in my caracal log

Well mstransform has many uses in the pipeline, including simply splitting out subsets of data unmodified, so you might be looking at those entries. It's only when splitting out the target that you should see docallib=True.

landmanbester commented 3 years ago

Yeah, I just had to scroll down to find the docallib=True entry. Just realised I don't actually need this since I won't be using CORRECTED_DATA but would be good to test once you have that fix in

landmanbester commented 3 years ago

I spoke too soon. I was just testing against a chunk of data and I must have gotten "lucky" because I have now tried to parametrise K as

K = exp( factor i delta * nu)

with factor set to all of [-1e-9, 1e-9, -1e-9pi, 1e-9pi, -2e-9pi, 2e-9pi]. I get the closest with factor = -1e-9 where about 4% of the data is equal to within a relative tolerance of 1e-3. The initial chunk I was testing on must have fallen in or close to the 4% where it is approximately correct. To avoid 2pi ambiguities (which can be problematic because of numerical precision) I am now comparing cos(phi) and sin(phi) (i.e. real and imaginary parts of exp(i*phi)) where phi is the visibility phase. Actually, the full test script is

from numpy.testing import assert_allclose
from pyrap.tables import table
import numpy as np

ms = table('/home/bester/projects/ESO137/testdir/1557347448_sdp_l0-cal.ms')
# ms = table('/home/landman/Data/pfb-testing/MS/point_gauss_nb.MS_p0')
flag_rows = ms.getcol('FLAG_ROW')
nrows = ms.nrows()
row_chunk = 100000
for istart in np.arange(0, nrows, row_chunk):
    print(istart)
    data = ms.getcolslice('DATA', (0, 0), (-1, -1), startrow=istart, nrow=row_chunk)
    cdata = ms.getcolslice('UNCORRECTED_DATA', (0, 0), (-1, -1), startrow=istart, nrow=row_chunk)
    flag = ms.getcolslice('FLAG', (0, 0), (-1, -1), startrow=istart, nrow=row_chunk)
    flag_row = flag_rows[istart:istart+row_chunk]

    f = np.logical_or(flag, flag_row[:, None, None])
    print(np.sum(f)/f.size)

    # check amplitudes
    try:
        assert_allclose(np.abs(data[~flag]), np.abs(cdata[~flag]), rtol=1e-3)
    except Exception as e:
        print("In abs:")
        print(e)

    # check phases (we need to check real and imaginary parts because of numerical limits and 2pi phase ambiguity)
    phase = np.angle(data[~f])
    cphase = np.angle(cdata[~f])
    try:
        assert_allclose(np.cos(phase), np.cos(cphase), rtol=1e-3)
    except Exception as e:
        print("In real:")
        print(e)
    try:
        assert_allclose(np.sin(phase), np.sin(cphase), rtol=1e-3)
    except Exception as e:
        print("In imag:")
        print(e)
ms.close()

I could use some help getting to the bottom of this. It feels like a really stupid problem to be stuck on. Is there any chance that both .K0 and .K1 tables are applied? For reference, the script to interpolate and apply gains is here

landmanbester commented 2 years ago

Done in https://github.com/ratt-ru/pfb-clean/pull/53

ratt-ru / pfb-imaging

Add DI gains during imaging #38