Closed PeterKamphuis closed 4 years ago
@o-smirnov This seems familiar - is it one of the things you fixed in your branch?
Very possible. It rings quite a few bells. @PeterKamphuis, would it be easy for you to repeat the test with the issue-326-chisq-jsk
branch?
@o-smirnov Depends how easily I can install the branch but I think it should not be a problem as I installed from the GitHub distribution already. I'll let you know tomorrow.
If you did pip install -e, then you only need to git checkout the branch.
Cheers, Oleg
Sent from my phone. Quality of spelling inversely proportional to finger size.
On Thu, 30 Jan 2020, 09:13 Peter Kamphuis, notifications@github.com wrote:
@o-smirnov https://github.com/o-smirnov Depends how easily I can install the branch but I think it should not be a problem as I installed from the GitHub distribution already. I'll let you know tomorrow.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ratt-ru/CubiCal/issues/336?email_source=notifications&email_token=ABRLTP4BSM7KF27BAQSRYT3RAJ42XA5CNFSM4KMZB6A2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKJ5VWY#issuecomment-580115163, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRLTP5FKHSOIJCB6DAFOVDRAJ42XANCNFSM4KMZB6AQ .
@o-smirnov Is that branch supposed to be python3 compatible? I got an error:
Initially trying to apply INFO 11:37:10 - main [0.1/0.1 0.6/0.6 0.2Gb] Exiting with exception: TypeError(list indices must be integers or slices, not str)
Which I figured might be a mismatch between the calibration database and the version or some such. Trying to remake the calibration resulted in the error.
INFO 11:40:27 - main [io] [0.2/0.3 0.9/0.9 0.3Gb] I/O handler for load 0 save None failed with exception: '>' not supported between instances of 'method' and 'int'
Which both seem exactly the kind of thing python3 is more asinine about?
Using Python 3.6.9 BTW
It supposed to be py3 compatible, but I suppose I could always have f*cked it up...
Can you rerun it serially for me please (--dist-ncpu 1
)? It ought to give a more informative stack trace then.
Applying is the same but I'll post the full trace back:
INFO 12:00:15 - main [0.1/0.1 0.6/0.6 0.2Gb] Exiting with exception: TypeError(list indices must be integers or slices, not str)
Traceback (most recent call last):
File "CubiCal/cubical/main.py", line 480, in main
global_options=GD, jones_options=jones_opts)
File "CubiCal/cubical/machines/abstract_machine.py", line 645, in create_factory
return machine_cls.Factory(machine_cls, *args, **kw)
File "CubiCal/cubical/machines/abstract_machine.py", line 770, in __init__
self.init_solutions()
File "CubiCal/cubical/machines/abstract_machine.py", line 790, in init_solutions
self.machine_class.exportable_solutions())
File "/home/peter/GitHub/CubiCal/cubical/machines/abstract_machine.py", line 838, in _init_solutions
self._init_sols[label] = param_db.load(filename), prefix, interpolate
File "CubiCal/cubical/param_db.py", line 50, in load
db._load(filename)
File "CubiCal/cubical/database/pickled_db.py", line 267, in _load
parm._paste_slice(item)
File "CubiCal/cubical/database/parameter.py", line 293, in _paste_slice
grid_index = self.grid_index[axis]
TypeError: list indices must be integers or slices, not str
Creating I get the same error as wel:
INFO 12:03:07 - main [0.3/0.3 0.9/0.9 0.3Gb] Exiting with exception: TypeError('>' not supported between instances of 'method' and 'int')
Traceback (most recent call last):
File "CubiCal/cubical/main.py", line 546, in main
stats_dict = workers.run_process_loop(ms, tile_list, load_model, single_chunk, solver_type, solver_opts, debug_opts, out_opts)
File "CubiCal/cubical/workers.py", line 216, in run_process_loop
return _run_single_process_loop(ms, load_model, single_chunk, solver_type, solver_opts, debug_opts, out_opts)
File "CubiCal/cubical/workers.py", line 347, in _run_single_process_loop
tile.load(load_model=load_model)
File "CubiCal/cubical/data_handler/ms_tile.py", line 928, in load
angles = self.dh.parallactic_machine.rotation_angles(subset.time_col)
File "CubiCal/cubical/machines/parallactic_machine.py", line 108, in rotation_angles
if log.verbosity > 1:
TypeError: '>' not supported between instances of 'method' and 'int'
I checked that it is only using a single cpu now. Sorry for not posting the full traceback right away.
Ok, after changing log.verbosity to log.verbosity() here https://github.com/ratt-ru/CubiCal/blob/dfc1f393c05cf06d5c13e090b45cf096ffa84e26/cubical/machines/parallactic_machine.py#L108 the branch runs to create a calibration db which then can be applied. However, the chi2 stats are suddenly all around 2 whereas they are around 1 with the master branch.
In any case applying the new calibration in this branch results in exactly the same 0 padding in the corrected visibilities.
Running everything on a single cpu now, just to make sure there is no issue there.
@o-smirnov Any progress on this? I am not as well versed in the interpolation code. With the exception of the heavily zeroed case (I think that this is likely solutions which were bad in the first time interval being applied to all time intervals) it looks like we are failing to raise flags for corrected data where we had failed solutions. It is debatable what should be placed in corrected data if calibration has failed. @SpheMakh advocates simply writing out the uncorrected data. Currently I believe we take the simpler approach of just multiplying in the zero gains, producing these zeros in the output. I am not against this, but it does seem like we are failing to flag the data appropriately.
@JSKenyon Correct me if I am wrong but when using xfer-from the bad solutions should be inferred/interpolated from the correct ones, right? Even at the start or end of the observations. So after interpolation there should be no zero-gains left I thought?
Additionally, as the frequency solutions are only split in two (64 channels in averaged case, 256 in apply) I am very surprised to see the the padding behaviour occur when interpolating as it does not follow the calibration interval. I don't see why a bad solution at the start would affect half of the calibration interval but not the other half. My first guess would be it has more to do with the interpolation and the flagged edges.
I think when there are no solutions the data should just be flagged also when just applying the table. The logic being that if you had used Cubical to calibrate and not just apply such solutions would also be flagged.
This may (and I stress may) have been the casacore reading bug which read 0's and nans from the data and model columns. Please try master and see if the problem is solved or not
(alternatively try stimela master)
I will check tomorrow.
I am getting the same problem when I apply the solutions to the same dataset for which I solved for.
Pre-Cubical data column: Post Cubical Data:
I suspect cubical is not writing flags properly. I am using the master branch. My parset is: [data] ms = 1july-v2.ms column = DATA time-chunk = 1000 freq-chunk = 262 chunk-by=SCAN_NUMBER chunk-by-jump = 0
[model] list = continuum_tagged.lsm.html@dE
[montblanc] dtype = double feed-type = circular mem-budget = 4096
[sol] jones = G,dE term-iters = 30,20
[out] column = CORRECTED_DATA overwrite=True mode = sr subtract-dirs = : [g] time-int = 40 update-type =phase-diag
[de] time-int = 1000 freq-int = 262 dd-term=True update-type =diag [dist] ncpu = 6
I do not see how it could be, but maybe we should check it is not a GMRT data set issue only as I am also using GMRT data.
Yes, I agree that this could be the case. One thing I did notice was that cubical seemed to seeing 32 stations e.g.
D0T1F0 Stations 30, 31 (2/32) fully flagged due to low SNR.
However there are only 30 stations in the array
@TariqBlecher Yes, GMRT data has two ghost telescopes. It is the same in casa. If I remember correctly they are for testing correlator noise and such. In anycase I only use the inner core (14 antennas) for my test set and the same thing happens.
I want to ask if the ANTENNA1 and ANTENNA2 column correctly labelled? I know HERA does (or used to) not adhere to the Standard v2.0 specification (CASA Memo 229) and used to store antenna numbers in that column instead of foreign keys to the ::ANTENNA table. That we cannot and will not support - it is an observatory mistake. Also note that when you merge datasets with CASA funny things happen. You may have more antennas if the ECEF positions were updated or have antennas that were not common to both datasets (e.g. 16 + 14 will show 16 in the concat dataset if the positions are the same).
On Thu, Mar 12, 2020 at 11:36 AM Tariq Blecher notifications@github.com wrote:
Yes, I agree that this could be the case. One thing I did notice was that cubical seemed to seeing 32 stations e.g.
D0T1F0 Stations 30, 31 (2/32) fully flagged due to low SNR.
However there are only 30 stations in the array
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ratt-ru/CubiCal/issues/336#issuecomment-598092185, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIVPJSU57OBEPRTFF2EXUDRHCUITANCNFSM4KMZB6AQ .
--
Benjamin Hugo
Junior Software Developer SARAO Black River Park, 2 Fir Street, Observatory, Cape Town, Western Cape, 7925 Contact: [+27] 0716293858 <+27%2071%20629%203858>
PhD. student, Radio Astronomy Techniques and Technologies, Department of Physics and Electronics, Rhodes University
I'm not sure what you mean? That memo says:
ANTENNA1 | Int | First antenna | ||
---|---|---|---|---|
ANTENNA2 | Int | Second antenna |
And my dataset has that in those columns. But you are saying that they should not be used for the antenna numbers but for foreign keys in the antenna table?
it is index keys (integers) so e.g. index 0 will correspond to row 0 of the ::ANTENNA table for which you can read the station name.
Specification 229 (https://casa.nrao.edu/Memos/229.html):
ANTENNAn
Antenna number ($ \geq$ 0), and a direct index into the ANTENNA sub-table rownr. For n > 2, triple-product data are implied.
But if you confirm these are just ghost positions in the ANTENNA subtable then it explains your error - there is no data and therefore no SNR to solve.
So I checked and there are ghost antennas in the antenna table (i.e. there are antennae in the antenna table which have no corresponding data rows) So should these antennas be deleted from the antenna table? So is Cubical assigning data rows to the wrong antennas?
it shouldn't (antenna index 30 and 31 are flagged) and these correspond to your 2 ghost antennas if I understand you correctly? Nothing to be worried about then - the warning is a red herring and the bug lies elsewhere. It is very likely the interpolation code.
In anycase I have split out those antennas. And yes Antenna1 and Antenna2 are integer columns in which the numbers correspond as direct indices to the corresponding antenna rows in the antenna table. That is how it should be if I understand correctly.
Ok I retested this with the current master and the issue remains. Updating to the current master rolled back the casacore version python-casacore 3.2.0 --> python-casacore-3.0.0.
The issue remains exactly as indicated before with the difference that for xfer-from with interpolation or without interpolation (chunks corresponding to the solution) the result is now the same and is as indicated in the original post for xfer-from without interpolation. I.e., the large padding seen around the visbilities in the xfer-from with interpolation is no longer present.
Thanks for checking Peter - I was hoping the it was related to the 32bit iterator indexer issue in the CC 3.0 for which the work around in the ms_tile reader fixed serious convergence issues on large tiles. To clarify: it should be rolled back to 3.0 - even though it is buggy it is the only version that natively compiles against KERN-3 on Ubuntu 16.04 LTS.
Well it does seem that something in the interpolation got fixed in the current master. It now looks like it is just blocks that do not have a solution are set to zero instead of being flagged as they would when actually calibrating. I will see if I can confirm that. I'll also try to transfer the low resolution model to the high resolution and see if the same blocks are solved and if they are all flagged properly in that case, my impression is yes because I really have only seen this when transferring the solutions.
It must be a bit more complicated than that though as there are some differences between using load-from and xfer-from.
Ok so this is not a GMRT issue. I ran the carate.sh suite for caracal with minimal docker config on the rawdata.tar. After successfully doing so I took a single data set (1524929477-circinus_p1_corr.ms) and reapplied the first phase only calibration with xfer-from and interpolating to 1 and 1 solutions. I did this in a freshly installed standalone cubical-venv with the current master.
The result is that practically every baseline has 0 bands at the start and beginning in the corrected visibilities. I show an example below (This is baseline 0-22). The black stripes at the start and the end are unflagged 0's.
OK, my naive attempts to reproduce this on my own MeerKAT MSs are failing. Clearly, I'm not driving it into the failing edge case.
@PeterKamphuis, could you please point me to a copy of the MS (GMRT, or a post-pipeline version of the circinus one) that I can reproduce this on? Preferably a simple recipe (MS + solutions DB + parset), so that all I need to do is run gocubical to reproduce the failure.
Otherwise, maybe if you (or @gigjozsa) could give me a step-by-step for running carate.sh and reproducing this problem, that'd also be good.
@o-smirnov I just emailed you the location of a tarball that you should be able to unpack and the in the directory simply run gocubical Apply_xfer_from.parset
. The python in my virtualenv is 3.6.9 and I am running from bash shell.
OK, there are two problems here, and at least one is fixed.
First problem is zeroes in the output. It turns out that missing solutions (i.e. slots that the parameter table machinery was unable to interpolate) were flagged, but those flags weren't propagated into the output properly due to a logic error. This should now be fixed, at least for the case mentioned by @PeterKamphuis in https://github.com/ratt-ru/CubiCal/issues/336#issuecomment-601086600. Please check with branch issue-336
.
@TariqBlecher, I hope this also fixes what you report in https://github.com/ratt-ru/CubiCal/issues/336#issuecomment-597575677. I hope you were using --load-from
, in which case the failure makes sense, as the same logic error above would have caused flagged solutions (flagged for whatever reason, low-SNR, whatnot) to produce a 0 visibility rather than a flag in the output. Please check. And if you tell me you were using --xfer-from
, I'm going to cry, because it's supposed to interpolate over small missing blocks like that.
Second problem is, why was is it unable to interpolate? I'll open a separate issue for this to keep things neat.
I used neither --load-from
nor --xfer-from
. I was just solving and output subtracted residuals.
OK that does makes me cry more.
I am getting the same problem when I apply the solutions to the same dataset for which I solved for.
I thought you meant apply mode here, which implies --load-from or --xfer-from.
Can you give an MS/parset illustrating the problem please?
So I just tested this on the GMRT set as well and there are no more zero's present with either --load-from, --xfer-from without interpolation or --xfer-from with interpolation (See below) when using the issue-336 branch.
Issue-366 load from:
In the left bottom there is a small additional flagging in the highly flagged area. In xfer-from without interpolation this is nicely interpolated over. For the rest xfer-from with no interpolation and load-from look the same.
When interpolating with xfer-from the full bands are around the visibilities. When creating the tables the solutions are split in two in frequency (256 channels, 64 in averaged dataset) and it seems that the outer half of the solution is not interpolated. I guess this should be treated in #357
issue-336 branch xfer-from:
The new flags are somehow not stored properly though. As I understand it a new cubical run with flagset: -cubical should remove all previously applied cubical flags. However, if I rerun the load-from after the xfer-from run with interpolation the result is the following:
I had to remake the calibration tables as both the master and issue-336 threw an error on the old tables. It is this remake that seems to have solved the problem of the gigantic blocks of zeros that were present in the initial test.
--load-from and --xfer-from without interpolation also do not show the bands of flagging at start/end, sides which also immediately explains why they are not present when directly applying the tables after solving.
@o-smirnov I hope that doesn't make you cry even more as it is a bit mixed message. Let me know if you would like the test set.
The outer bands are quite clearly a failure to extrapolate, in either time or frequency. But this:
The new flags are somehow not stored properly though. As I understand it a new cubical run with flagset: -cubical should remove all previously applied cubical flags. However, if I rerun the load-from after the xfer-from run with interpolation the result is the following:
is not explained. So yes please, give me a "care package" with MS+DB+parset.
I have tried to apply a calibration table from a ms where the frequency channels are averaged to the un-averaged ms. However, this results in 0's in the corrected visibilities in areas where there seem to be no solutions due to flags in the averaged set (Or simply no solutions). For more background see https://github.com/ska-sa/meerkathi/issues/627
I tested by having an averaged set of GMRT-GSB data in which 4 channels are binned together. This set is then additionally flagged and calibrated with cubical with the settings listed at the bottom of this post.
I have applied the calibration both with load-from with the same solution intervals (scaled by 4 in frequency), xfer-from with the same solution intervals and xfer-from with interpolating to 1 time-int and 1 freq-int. The latter is the worst with large 0 padding all around the data (see below). I have tried in the meerkathi-master as well as in my own branch and a standalone cubical installed from the master today. All give the same result when not interpolating but slightly different results when interpolating.
Averaged dataset corrected visibilities baseline 0-1 and non averaged with calibration aplied from load-from: xfer-from no interpolation and xfer-from with interpolation
Parsets to make the calibration table:
Parset for applying with load-from: