ratt-ru / CubiCal

A fast radio interferometric calibration suite.
GNU General Public License v2.0
18 stars 13 forks source link

format string error when reporting flag percentages in gain machine #369

Open o-smirnov opened 4 years ago

o-smirnov commented 4 years ago

Reported by @SpheMakh. Probably due to some edge case when everything was flagged, and the reduction collapsed to some weird array.

# Traceback (most recent call last):
#   File "/usr/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
#     r = call_item.fn(*call_item.args, **call_item.kwargs)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/solver.py", line 904, in run_solver
#     corr_vis = solver_machine.run()
#   File "/usr/local/lib/python3.6/dist-packages/cubical/solver.py", line 727, in run
#     SolveOnly.run(self)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/solver.py", line 711, in run
#     self.sol_opts, label=self.label)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/solver.py", line 101, in _solve_gains
#     gm.precompute_attributes(obser_arr, model_arr, flags_arr, inv_var_chan)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/machines/complex_2x2_machine.py", line 188, in precompute_attributes
#     super(Complex2x2Gains, self).precompute_attributes(data_arr, model_arr, flags_arr, noise)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/machines/interval_gain_machine.py", line 500, in precompute_attributes
#     self._report_gain_flags(low_snr, "on low SNR", "your max-prior-error settings", self.low_snr_warn)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/machines/interval_gain_machine.py", line 556, in _report_gain_flags
#     str(d), percflagged[d]) for d in bad_dirs]),
#   File "/usr/local/lib/python3.6/dist-packages/cubical/machines/interval_gain_machine.py", line 556, in <listcomp>
#     str(d), percflagged[d]) for d in bad_dirs]),
# TypeError: unsupported format string passed to numpy.ndarray.__format__
# """
bennahugo commented 4 years ago

Ok this is biting me it seems. It hangs after about 3 hours of processing with multiple directions (all processes and threads are in sleep state. Will need to fix it as a matter of urgency tomorrow.

bennahugo commented 4 years ago

Specifically happens in solver verbosity > 1. I'm going to try run in lower verbosity right now to check if the run at least goes through and there is nothing else wrong on top

bennahugo commented 4 years ago

Ok I can confirm it runs at lower verbosity levels. However it does point to a problem in collecting return codes from subprocess pool.

bennahugo commented 4 years ago

I spoke too soon. The process hung after two tiles. There are no exceptions reported in the logs

bennahugo commented 4 years ago

scratch that, upon further inspection it seems the previous containers didn't properly terminate and were hanging onto a write lock. Will need to reset flags and start again :(