Closed gijzelaerr closed 10 years ago
Original comment thread migrated from bugzilla
I've been trying to run some MeqTrees scripts in batch with the python xyz.py -run option, but I seem to have a high rate of failure with things failing in unexpected ways before processing is complete. For example I just had a crash with
rqid ev.0.0.0.0.0 rqid ev.0.0.0.1.0 rqid ev.0.0.0.2.0 rqid ev.0.0.0.3.0 rqid ev.0.0.0.4.0 rqid ev.0.0.0.5.0 rqid ev.0.0.0.6.0 rqid ev.0.0.0.7.0 rqid ev.0.0.0.8.0 rqid ev.0.0.0.9.0 rqid ev.0.0.0.10.0 rqid ev.0.0.0.11.0 Traceback (most recent call last): File "MG_AGW_gauss_fit_obs.py", line 420, in ? mod._test_forest(mqs,None,wait=True); File "MG_AGW_gauss_fit_obs.py", line 358, in _test_forest mqs.meq('Node.Execute',record(name='req_seq',request=request),wait=wait); File "/home/twillis/Timba/install/current/libexec/python/Timba/Apps/meqserver. py", line 124, in meq msg = self.await(replyname,resume=True,timeout=wait); File "/home/twillis/Timba/install/current/libexec/python/Timba/Apps/app_proxy. py", line 418, in await res = self._pwp.await(self._rcv_prefix + what,timeout=await_timeout,resume=r esume); File "/home/twillis/Timba/install/current/libexec/python/Timba/octopussy.py", line 420, in await self.resume_events(); File "/home/twillis/Timba/install/current/libexec/python/Timba/octopussy.py", line 400, in resume_events self._lock.release(); File "/usr/local/lib/python2.4/threading.py", line 113, in release assert self.__owner is me, "release() of un-acquire()d lock" AssertionError: release() of un-acquire()d lock
Scripts and data sets for testing are available on request.
This bug 604 is really annoying if it happens 24 hrs into a 3-day job!! Luckily I'm writing the stuff I need into a meqlog.mql file so I can resume from the point of failure.
Traceback (most recent call last): File "test_azel_obs.py", line 257, in ? mod._test_forest(mqs,None,wait=True); File "test_azel_obs.py", line 240, in _test_forest mqs.meq('Node.Execute',record(name='req_seq',request=request),wait=wait); File "/home/twillis/Timba/install/current/libexec/python/Timba/Apps/meqserver.py", line 119, in meq msg = self.await(replyname,resume=True,timeout=wait); File "/home/twillis/Timba/install/current/libexec/python/Timba/Apps/multiapp_proxy.py", line 515, in await res = self._pwp.await(self._rcv_prefix + what,timeout=await_timeout,resume=resume); File "/home/twillis/Timba/install/current/libexec/python/Timba/octopussy.py", line 427, in await self.resume_events(); File "/home/twillis/Timba/install/current/libexec/python/Timba/octopussy.py", line 407, in resume_events self._lock.release(); File "/usr/lib64/python2.4/threading.py", line 113, in release assert self.__owner is me, "release() of un-acquire()d lock" AssertionError: release() of un-acquire()d lock meqserver(meqserver.py:271:stop_default_mqs): stopping default meqserver
Assuming this has died of old age (Sarod uses batch processing with great regularity).
at 2007-11-13 22:07:06 Tony Willis reported:
batch processing seems unstable