ratt-ru / meqtrees-cattery

MeqTrees-based frameworks for simulation and calibration of radio interferometers
Other
5 stars 9 forks source link

Stefcal crashes with a completely flagged slot #78

Open bennahugo opened 6 years ago

bennahugo commented 6 years ago

Sharmila reported this. On one of the particularly bad datasets a previous round of selfcal flagged particularly aggressively and the chisq is 0. Why is this an integer?

2022.12 47.3Gb gainopts(StefCal.py:752:get_result): ('00', '01') data type of model is complex128
2022.12 47.3Gb gainopts(StefCal.py:777:get_result): G: solvable 1 from major loop 0 (current 0)
2030.97 47.3Gb gainopts(StefCal.py:1276:run_gain_solution): solving for G, initial chisq is 51487312.2275
2060.32 47.3Gb gainopts(StefCal.py:1287:run_gain_solution): iter 1: 0.00% (0/62) conv, 0 gfs, max update 3.44259
2090.19 47.3Gb gainopts(StefCal.py:1287:run_gain_solution): iter 2: 0.00% (0/62) conv, 0 gfs, max update 0.196195
2119.80 47.3Gb gainopts(StefCal.py:1287:run_gain_solution): iter 3: 0.00% (0/62) conv, 0 gfs, max update 0.111077
2154.24 47.3Gb gainopts(StefCal.py:1287:run_gain_solution): iter 4: 100.00% (62/62) conv, 868 gfs, max update 0
2160.75 47.3Gb gainopts(StefCal.py:1323:run_gain_solution): G converged at chisq 51487312.2275 (last gain update 0) after 4 iterations and 129.77s
2160.75 47.3Gb gainopts(StefCal.py:1369:run_gain_solution):   delta-chisq were 
2160.75 47.3Gb gainopts(StefCal.py:1370:run_gain_solution):   convergence criteria were 3.4 0.2 0.11 0
2160.76 47.3Gb gainopts(StefCal.py:1393:run_gain_solution): flagged gains per antenna: 00 574200.00%, 01 574200.00%, 03 574200.00%, 06 574200.00%, 07 574200.00%, 08 574200.00%, 10 574200.00%, 12 574200.00%, 14 574200.00%, 15 574200.00%, 17 574200.00%, 18 574200.00%, 21 574200.00%, 31 574200.00%
2161.16 47.3Gb gainopts(StefCal.py:786:get_result): applying G-inverse to data
2163.35 47.8Gb gainopts(StefCal.py:790:get_result): done
2163.35 47.8Gb gainopts(StefCal.py:777:get_result): B: solvable 1 from major loop 0 (current 0)
2171.77 47.8Gb gainopts(StefCal.py:1276:run_gain_solution): solving for B, initial chisq is 0
2207.46 47.9Gb gainopts(StefCal.py:1287:run_gain_solution): iter 1: 1.01% (58/5742) conv, 0 gfs, max update 0.5353
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/Cattery/Calico/OMS/StefCal/StefCal.py", line 783, in get_result
    flagged |= self.run_gain_solution(opt,model,data,weight,bitflags,flag_null_gains=True,looptype=looptype);
  File "/usr/local/lib/python2.7/dist-packages/Cattery/Calico/OMS/StefCal/StefCal.py", line 1300, in run_gain_solution
    dchi = (chisq0-chisq)/chisq;
ZeroDivisionError: integer division or modulo by zero
1.7Gb meqserver(meqserver.py:288:stop_default_mqs): meqserver not exited yet, waiting another 10 seconds
/home/sharmila/output/1491862657-corr-cal2.gain.cp does not exist, so not trying to remove
/home/sharmila/output/1491862657-corr-cal2.gain1.cp does not exist, so not trying to remove
input flagmask is
### Job result: None
### No more commands
### meqserver reported 25 error(s) during the run:
###   000: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.0.1)
###   001: node 'VisDataMux': error processing tile 0.0.0.0.1.1
###   002: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.1.1)
###   003: node 'VisDataMux': error processing tile 0.0.0.0.1.2
###   004: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.2.1)
###   005: node 'VisDataMux': error processing tile 0.0.0.0.1.3
###   006: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.3.1)
###   007: node 'VisDataMux': error processing tile 0.0.0.0.1.4
###   008: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.4.1)
###   009: node 'VisDataMux': error processing tile 0.0.0.0.1.5
###   010: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.5.1)
###   011: node 'VisDataMux': error processing tile 0.0.0.0.1.6
###   012: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.6.1)
###   013: node 'VisDataMux': error processing tile 0.0.0.0.1.7
###   014: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.7.1)
###   015: node 'VisDataMux': error processing tile 0.0.0.0.1.8
###   016: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.8.1)
###   017: node 'VisDataMux': error processing tile 0.0.0.0.1.9
###   018: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.9.1)
###   019: node 'VisDataMux': error processing tile 0.0.0.0.1.10
###   020: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.10.1)
###   021: node 'VisDataMux': error processing tile 0.0.0.0.1.11
###   022: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.11.1)
###   023: node 'VisDataMux': error processing footer 0.0.0
###   024: node 'VisDataMux': execute() failed: integer division or modulo by zero (return code 0x810021)
### Stopping the meqserver
### All your batch are not belong to us, returning with error code
Traceback (most recent call last):
  File "/code/run.py", line 256, in <module>
    run_meqtrees(msname)
  File "/code/run.py", line 220, in run_meqtrees
    utils.xrun(cab['binary'], args + ['-s {}'.format(saveconf) if saveconf else ''])
  File "/utils/utils/__init__.py", line 74, in xrun
    raise SystemError('%s: returns errr code %d'%(command, process.returncode))
SystemError: /usr/bin/meqtree-pipeliner.py: returns errr code 1
IanHeywood commented 6 years ago

I see the division-by-zero message a lot and assumed it was for fully flagged tiles, but it's never caused it to bail out. But then also, I've never solved for B.

/usr/local/lib/python2.7/dist-packages/Cattery/

Maybe it's been fixed in a more recent version than whatever repo this was installed from?

bennahugo commented 6 years ago

Hmm yea I will have to dig. It is probably a B jones thing. We solve for G (so take DC out) then B in chunks to get the SNR. This usually gives better results than just one G term. This is as recent as it gets - KERN 3.

o-smirnov commented 6 years ago

It's initialized as integer 0, then accumulated... so yeah, if an entire tile is flagged, it just remains integer 0.

o-smirnov commented 6 years ago

I need to add a check for this so it doesn't just fall over stupidly.

ludwigschwardt commented 6 years ago

We've recently done similar things to stefcal in our cal pipeline to avoid crashes on fully flagged data - might be useful to compare notes.

On 21 December 2017 at 14:01, Oleg Smirnov notifications@github.com wrote:

I need to add a check for this so it doesn't just fall over stupidly.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ska-sa/meqtrees-cattery/issues/78#issuecomment-353334173, or mute the thread https://github.com/notifications/unsubscribe-auth/AAX70J7NkJjHK2xt9nuhJaAzV1dgTTo8ks5tCkipgaJpZM4RJkqK .