rdeane / MeqSilhouette

A synthetic data simulation package for the Event Horizon Telescope
GNU General Public License v2.0
1 stars 4 forks source link

Ability to read an existing MS #18

Open iniyannatarajan opened 3 years ago

iniyannatarajan commented 3 years ago

@mjanssen2308 @freekroelofs @rdeane I have added a new driver script with the ability to read in an existing MS (usage explained here). This MS will be "regularized" i.e., if it does not contain entries for all baselines for each timestamp, they will be added. Any antenna present in the ANTENNA table but contains no entries in the MAIN table will also be identified and all corresponding baselines will be added. Any row added during regularization contains the following values: DATA=0, WEIGHT=1, SIGMA=1, FLAG=True, FLAG_ROW=True.

One caveat with using an existing MS: If the input MS has some timestamps missing (e.g. between scans), then tropospheric turbulence cannot be added, since the Cholesky decomposition of the covariance matrix will fail. One workaround is to split the input MS into multiple MSes each containing no missing timestamps. The solution might lie in doing an LU decomposition instead of Cholesky, which I am looking at.

rdeane commented 3 years ago

Sounds good @saiyanprince, great stuff. A couple of questions/comments:

iniyannatarajan commented 3 years ago

Ah, I forgot to mention this. The ANTENNA1, ANTENNA2 columns in MAIN are populated with the indices of the stations in the ANTENNA table. Deleting some entries from the ANTENNA table will leave the station indices inconsistent between MAIN and ANTENNA. We could modify the values of the ANTENNA{1/2} columns in MAIN, but cross-MS comparisons (or splitting an MS into sub-MSes) will require more care. Currently, the ANTENNA table is left intact and it also remains possible to use other input files such as station_info and bandpass_info containing all the stations to be reused if simulating from multiple existing datasets which can have different stations missing. One way to circumvent this last issue is by reading in all input files into dictionaries instead of lists or arrays, which will actually be the better implementation overall. For now, I just decided to include all stations in the MAIN table so that the feature is out for testing.

Regarding the second point, I do see that there are timestamps missing intra-scan (for a few minutes) in the scan-averaged M87 datasets I've been using to test this. If the decomposition can be made to work either by fiddling with the sampling or by using a different method, we can avoid splitting the MSes. I'll look into this.

rdeane commented 3 years ago

For point 1: fully agreed with your pragmatic approach - I didn't know they were actually present in MAIN. Your comment on dictionaries is an old sore point for me - I should have done this from the very start of the project! I never had the time/nerve to redo it all, but perhaps MeqSv3.0 is the opportunity (let's discuss in our weekly meeting).

For point 2: yes, basically if the missing samples are mostly due to flagging, that's going to be a recurring issue and interpolation makes sense. If it's major time gaps due to scheduling, technical issues, elevation limits, etc, then the split/concat approach is better, in my view.