ratt-ru / CubiCal

A fast radio interferometric calibration suite.
GNU General Public License v2.0
18 stars 13 forks source link

Temporary fix for nan values in flag visibilities #408

Closed ulricharmel closed 3 years ago

ratt-priv-ci commented 4 years ago

Can one of the admins verify this patch?

JSKenyon commented 4 years ago

ok to test

JSKenyon commented 4 years ago

@ulricharmel Out of interest, are the NaN values in the weight column of the measurement set?

ulricharmel commented 4 years ago

Hi @JSKenyon Sorry for the late replay. Those visibilities are flagged and their weights are zero. I did not check from the measurement set. Maybe those weights have been set to zero by the data handler because they are flagged.

JSKenyon commented 4 years ago

The reason I ask is because this line cannot produce NaN values unless there are already NaN values present in its inputs. I guess my question is in which array are the NaN values living?

ulricharmel commented 4 years ago

I checked it interactively with pdb. The weights were zero. But the values were nan +nanj in the off diagonal. The multiplication line gives a warning about invalid value encounter in multiplication. I tested it and the output of the multiplication is nan+nanj. I also tried multiplying those values with zero and I got nan+nanj with the warning.

JSKenyon commented 4 years ago

Ok, but does self.obser_arr contain NaN values on its off-diagonals?

ulricharmel commented 4 years ago

Yes it does

JSKenyon commented 4 years ago

@ulricharmel Ah thanks. That is odd. And are those NaN values in the data column on disk?

@o-smirnov What is the expected behaviour here? While I am sure Ulrich's solution will work, I thought that we were cleaning up dodgy input values elsewhere.

bennahugo commented 4 years ago

I've made fixes that should prevent this using getcolnp. Is your branch in sync with master. You should not be getting nan values unless there are actually nan values on the data array on disk.

An upgrade to casacore 3.3 and python-casacore 3.3 should also fix this problem.

On Fri, Aug 14, 2020 at 10:55 AM JSKenyon notifications@github.com wrote:

@ulricharmel https://github.com/ulricharmel Ah thanks. That is odd. And are those NaN values in the data column on disk?

@o-smirnov https://github.com/o-smirnov What is the expected behaviour here? While I am sure Ulrich's solution will work, I thought that we were cleaning up dodgy input values elsewhere.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ratt-ru/CubiCal/pull/408#issuecomment-673969376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4RE6XJLHYGDFEMID34IJLSAT3ZLANCNFSM4PX3ILRA .

--

Benjamin Hugo

PhD. student, Centre for Radio Astronomy Techniques and Technologies Department of Physics and Electronics Rhodes University

Junior software developer Radio Astronomy Research Group South African Radio Astronomy Observatory Black River Business Park Observatory Cape Town

ulricharmel commented 4 years ago

@JSKenyon I did not check the MS itself, I checked for a specific time chunk the data that was being passed to the solver.

ulricharmel commented 4 years ago

@bennahugo I will upgrade to python-casacore 3.3 and test.

bennahugo commented 4 years ago

do make sure you compile your python-casacore 3.3 against casacore 3.3. The bug is in the C++ code in casacore itself.

On Fri, Aug 14, 2020 at 11:09 AM ulricharmel notifications@github.com wrote:

@bennahugo https://github.com/bennahugo I will upgrade to python-casacore 3.3 and test.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ratt-ru/CubiCal/pull/408#issuecomment-673975357, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4RE6T5FFQ2RQXU4A3PZP3SAT5LVANCNFSM4PX3ILRA .

--

Benjamin Hugo

PhD. student, Centre for Radio Astronomy Techniques and Technologies Department of Physics and Electronics Rhodes University

Junior software developer Radio Astronomy Research Group South African Radio Astronomy Observatory Black River Business Park Observatory Cape Town

o-smirnov commented 3 years ago

Sounds like it's a workaround for NaN values in the input that should not have been appearing in the first place (especially from this comment). I have also made further fixes to MS reading in #424.

However let's keep this open for the moment, as looking at the code, I can't find where flagged data/model is actually nulled. Except at https://github.com/ratt-ru/CubiCal/blob/master/cubical/solver.py#L108, but this ignore FL.PRIOR. So I suppose it's possible to have a pre-flagged NaN value sneak in and cause some havoc. For the sake of being defensive, we probably need to null it in the ms_tile code -- I thought I did that, but I can't find the statement!

JSKenyon commented 3 years ago

Closing for now - please reopen if this issue persists.