lofar-astron / factor

Facet calibration for LOFAR
http://www.astron.nl/citt/facet-doc
GNU General Public License v2.0
19 stars 12 forks source link

error with peeling source #212

Closed rvweeren closed 7 years ago

rvweeren commented 7 years ago

Any idea? (posting for Federica)

This is the specification of the peeling direction: outlier 4h38m55.840s,21d53m10.40s empty empty 512 600 600 LD empty empty /lofar4/stnv039/A478/skymodels/outlier.skymodel True 0.1 4000

INFO - factor:directions - Adjusting facets to avoid sources... INFO - factor:directions - Including target (04h13m25.6s, +10d28m01s) in facet adjustment /home/lofar/opt/src/factor/factor/process.py:652: RuntimeWarning: invalid value encountered in double_scalars effective_flux_jy = peak_flux_jy_bm * (total_flux_jy / peak_flux_jy_bm)**0.667 INFO - factor - Direction outlier will be peeled using sky model: /lofar4/stnv039/A478/skymodels/outlier.skymodel DEBUG - factor:directions - Processing each direction in series INFO - factor - Peeling 1 direction(s) Traceback (most recent call last): File "/home/lofar/opt/src/factor/bin/runfactor", line 72, in reset_operations, stop_after=options.stop_after) File "/home/lofar/opt/src/factor/factor/process.py", line 113, in run op = OutlierPeel(parset, bands, d) File "/home/lofar/opt/src/factor/factor/operations/outlier_ops.py", line 29, in init self.direction.set_imcal_parameters(parset, bands) File "/home/lofar/opt/src/factor/factor/lib/direction.py", line 267, in set_imcal_parameters self.frac_bandwidth_selfcal_facet_image, padding) File "/home/lofar/opt/src/factor/factor/lib/direction.py", line 438, in set_imaging_parameters if any([s > large_size_arcmin for s in sizes_arcmin]): TypeError: 'NoneType' object is not iterable

darafferty commented 7 years ago

This error probably indicates that the source does not appear in the highest-frequency sky model (which is used to estimate the source sizes). At the moment, Factor requires that a source appears in every sky model, but I guess this does not need to be enforced for peeling. I've asked Federica to test a change that will allow Factor to continue in this case.

FedericaSavini commented 7 years ago

Unfortunately, although Factor went through the sky model step, DPPP failed..

INFO - factor - Direction outlier will be peeled using sky model: /lofar4/stnv039/A478/skymodels/outlier.skymodel DEBUG - factor:directions - Processing each direction in series INFO - factor - Peeling 1 direction(s) DEBUG - factor:outlier - Calibrator is 1.0 deg across DEBUG - factor:outlier - Target timewidth for selfcal is 19.9955879189 s DEBUG - factor:outlier - Target bandwidth for selfcal is 0.23135107869 MHz DEBUG - factor:outlier - Using averaging steps of 2 channels and 2 time slots for selfcal DEBUG - factor:outlier - Facet image before padding is 2341 x 2341 pixels (0.975416666667 x 0.975416666667 deg) DEBUG - factor:outlier - Target timewidth for facet imaging is 20.4995348165 s DEBUG - factor:outlier - Target bandwidth for facet imaging is 0.23135107869 MHz DEBUG - factor:outlier - Using averaging steps of 2 channels and 3 time slots for facet imaging /home/lofar/opt/src/factor/factor/lib/direction.py:758: RuntimeWarning: invalid value encountered in double_scalars effective_flux_jy = peak_flux_jy_bm * (total_flux_jy / peak_flux_jy_bm)*0.667 DEBUG - factor:outlier - Total flux density of calibrator: 0.0 Jy DEBUG - factor:outlier - Peak flux density of calibrator: 0.0 Jy/beam DEBUG - factor:outlier - Effective flux density of calibrator: nan Jy DEBUG - factor:outlier - Using solution intervals of 600 (fast) and 600 (slow) time slots QPID support NOT enabled! Will NOT connect to any broker, and messages will be lost! INFO - factor:scheduler - <-- Operation outlierpeel started (direction: outlier) ERROR - factor:scheduler - Operation outlierpeel failed due to an error (direction: outlier) ERROR - factor:scheduler - Caught an (Keyboard-)Interrupt, stopping all pipelines. Process PoolWorker-1: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(self._args, **self._kwargs) TypeError: set() takes exactly 1 argument (3 given) DEBUG - factor:scheduler - Time for operation: 0:53:47.774926 ERROR - factor:scheduler - One or more operations failed due to an error. Exiting...

2017-07-20 12:50:59 ERROR node.node08.executable_args: Command '/home/lofar/opt2/lofar/bin/DPPP' returned non-zero exit status -6

darafferty commented 7 years ago

The actual error seems to be related to insufficient memory:

Backtrace follows:
#0  0x2ba33abeec14 in LOFAR::Exception::terminate() at Exception.cc:89
#1  0x2ba33aed46b6 in std::rethrow_exception(std::__exception_ptr::exception_ptr) at ??:0
#2  0x2ba33aed4701 in std::terminate() at ??:0
#3  0x2ba33aed4919 in __cxa_throw at ??:0
#4  0x41ff9b in casa::Allocator_private::BulkAllocatorImpl<casa::casacore_allocator<std::complex<double>, 32ul> >::allocate(unsigned long, void const*) at Allocator.h:290
#5  0x2ba33a23a84a in casa::Block<std::complex<double> >::init(casa::ArrayInitPolicy) at Block.h:766
#6  0x2ba33a23ce89 in casa::Array<std::complex<double> >::Array(casa::IPosition const&, casa::ArrayInitPolicy, casa::Allocator_private::BulkAllocator<std::complex<double> >*) at Array.tcc:94
#7  0x2ba33a23e1a4 in casa::Array<std::complex<double> >::resize(casa::IPosition const&, bool, casa::ArrayInitPolicy) at Array.tcc:765
#8  0x2ba33a2ff5a4 in LOFAR::DPPP::StefCal::StefCal(unsigned int, unsigned int, LOFAR::DPPP::StefCal::StefCalMode, bool, double, unsigned int, bool, unsigned int) at StefCal.cc:72
#9  0x2ba33a2f2967 in LOFAR::DPPP::GainCal::updateInfo(LOFAR::DPPP::DPInfo const&) at GainCal.cc:260
#10 0x2ba33a1fe8df in LOFAR::DPPP::DPStep::setInfo(LOFAR::DPPP::DPInfo const&) at DPStep.cc:36
#11 0x2ba33a1d6129 in LOFAR::DPPP::DPRun::makeSteps(LOFAR::ParameterSet const&) at DPRun.cc:339
#12 0x2ba33a1d9375 in LOFAR::DPPP::DPRun::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, char**) at DPRun.cc:132
#13 0x41c469 in main at NDPPP.cc:88
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

This occurred for a few chunks, but it worked for others. The problem might have been caused by using 600 time slots per solution interval -- can you try it again with more typical values? Something like 1 timeslot for the fast phase solve and 120 for the slow gain solve:

outlier 4h38m55.840s,21d53m10.40s empty empty 512 1 120 LD empty empty /lofar4/stnv039/A478/skymodels/outlier.skymodel True 0.1 4000

rvweeren commented 7 years ago

Yes, 600 is way too much. I suggest a solution interval of 10-20 min or so (which would be 75-150, if the integration time is 8s)

FedericaSavini commented 7 years ago

Ok, it worked. Now it's running self-cal on the first facet! Thanks!

rvweeren commented 7 years ago

Ok, once David commits this change to github the issue can be closed.