ratt-ru / meqtrees

A library for implementing radio astronomical Measurement Equations
http://meqtrees.net
10 stars 2 forks source link

batch processing seems unstable #603

Closed gijzelaerr closed 10 years ago

gijzelaerr commented 10 years ago
at 2007-11-13 22:07:06 Tony Willis reported:

batch processing seems unstable

gijzelaerr commented 10 years ago

Original comment thread migrated from bugzilla

at 2007-11-13 22:07:06 Tony Willis replied:

I've been trying to run some MeqTrees scripts in batch with the python xyz.py -run option, but I seem to have a high rate of failure with things failing in unexpected ways before processing is complete. For example I just had a crash with

rqid ev.0.0.0.0.0 rqid ev.0.0.0.1.0 rqid ev.0.0.0.2.0 rqid ev.0.0.0.3.0 rqid ev.0.0.0.4.0 rqid ev.0.0.0.5.0 rqid ev.0.0.0.6.0 rqid ev.0.0.0.7.0 rqid ev.0.0.0.8.0 rqid ev.0.0.0.9.0 rqid ev.0.0.0.10.0 rqid ev.0.0.0.11.0 Traceback (most recent call last): File "MG_AGW_gauss_fit_obs.py", line 420, in ? mod._test_forest(mqs,None,wait=True); File "MG_AGW_gauss_fit_obs.py", line 358, in _test_forest mqs.meq('Node.Execute',record(name='req_seq',request=request),wait=wait); File "/home/twillis/Timba/install/current/libexec/python/Timba/Apps/meqserver. py", line 124, in meq msg = self.await(replyname,resume=True,timeout=wait); File "/home/twillis/Timba/install/current/libexec/python/Timba/Apps/app_proxy. py", line 418, in await res = self._pwp.await(self._rcv_prefix + what,timeout=await_timeout,resume=r esume); File "/home/twillis/Timba/install/current/libexec/python/Timba/octopussy.py", line 420, in await self.resume_events(); File "/home/twillis/Timba/install/current/libexec/python/Timba/octopussy.py", line 400, in resume_events self._lock.release(); File "/usr/local/lib/python2.4/threading.py", line 113, in release assert self.__owner is me, "release() of un-acquire()d lock" AssertionError: release() of un-acquire()d lock

Scripts and data sets for testing are available on request.

at 2008-03-28 15:03:14 Tony Willis replied:

This bug 604 is really annoying if it happens 24 hrs into a 3-day job!! Luckily I'm writing the stuff I need into a meqlog.mql file so I can resume from the point of failure.

Traceback (most recent call last): File "test_azel_obs.py", line 257, in ? mod._test_forest(mqs,None,wait=True); File "test_azel_obs.py", line 240, in _test_forest mqs.meq('Node.Execute',record(name='req_seq',request=request),wait=wait); File "/home/twillis/Timba/install/current/libexec/python/Timba/Apps/meqserver.py", line 119, in meq msg = self.await(replyname,resume=True,timeout=wait); File "/home/twillis/Timba/install/current/libexec/python/Timba/Apps/multiapp_proxy.py", line 515, in await res = self._pwp.await(self._rcv_prefix + what,timeout=await_timeout,resume=resume); File "/home/twillis/Timba/install/current/libexec/python/Timba/octopussy.py", line 427, in await self.resume_events(); File "/home/twillis/Timba/install/current/libexec/python/Timba/octopussy.py", line 407, in resume_events self._lock.release(); File "/usr/lib64/python2.4/threading.py", line 113, in release assert self.__owner is me, "release() of un-acquire()d lock" AssertionError: release() of un-acquire()d lock meqserver(meqserver.py:271:stop_default_mqs): stopping default meqserver

at 2009-02-11 12:52:01 Oleg Smirnov replied:

Assuming this has died of old age (Sarod uses batch processing with great regularity).