Open sjperkins opened 6 years ago
In theory, using TIME_CENTROID is the most precise way. In practice, most applications will not see a difference in the result (for things like PA and DIEs, it's certainly down in the noise and/or machine precision). So I would think really hard before we (a) pour Simon time into this, (b) give up any significant performance.
Arguing the other side: at some point we need to support BDA datasets, in which case many things become per-baseline anyway...
I completely agree with OMS on this issue.
@o-smirnov @landmanbester @saiyanprince @JSKenyon @MichelleLochner @bennahugo @rdeane @twillis449.
I've written up a short latex document describing the accuracy (DDFacet, Cubical, Bayesian Inference) vs performance (Bayesian Inference) issues facing Montblanc. I'd appreciate your input on this because it seems like we're going to have to sacrifice some performance in order to obtain accuracy when data is flagged.
/cc @joshvstaden @ianheywood too
Updated document to indicate that TIME_CENTROID usually differs from TIME when data has been averaged...
Two comments 1) You should always use the UVW coordinates provided in the measurement set rather than computing the values yourself from the differences in the XYZ positions of the antennas making up the baseline. In real-world data the observatory making the observations may have 'adjusted' the UVW values due to some obscure local conditions. @o-smirnov and I had an extensive discussion about this issue many years ago with Wim Brouw, author of the CASA Measures package. 2) When you do data averaging, you are doing so on shorter baselines where the fringe rate is much slower than on long baselines. e.g. the SKA will have a longest baseline of somewhere around 150 km and a shortest baseline of 29 metres and a 'standard' integration period of 0.14 sec (if the SKA has continued to shrink my values may no longer be quite right). At a wavelength of 21 cm you can average 512 data points on your 29m baseline and not lose any information content in your field of view. And I suspect any differences between TIME and TIME-CENTROID would not be noticeable in the averaged data.
Seems like per-baseline complex phase (from UVW coordinates) is going to be the way forward for the purposes of correctness. However, I still aim to support the antenna decomposition so I've been putting switches into the tensorflow kernels in the dask branch to allow plugging in terms in various places.
The Measurement Set specification says that UVW coordinates are calculated w.r.t TIME_CENTROID.
This raises questions as to whether we should calculate other quantities w.r.t. TIME_CENTROID. I can think of:
Related to #248 /cc @JSKenyon @o-smirnov @landmanbester