Open gijzelaerr opened 10 years ago
Original comment thread migrated from bugzilla
Occasionally, when running in batch mode, the meqserver can die or halt in an ungraceful way. The python traceback produces the output given at the end of this report. A complete simulation is available for testing in the directory ~twillis/ASKAP_demo on birch.
Output from a failed run ...
meqserver(meqserver.py:289:stop_default_mqs): meqserver not exiting cleanly,
killing it
8Python: =================== stopping OCTOPUSSY ========================
254 0.00037146 -0.00032291 0.0 10.642 1400000000.0
========== Running batch example sim
Stopping meqserver
Bye!
Traceback (most recent call last):
File "batch_sim_two.py", line 22, in
Looks like it's related to (or the same thing as) bug 573, which ought to tell you how old this thing is!
As already discussed by e-mail: have you got a script that produces it consistently on, say, birch?
Yep, go to /home/twillis/ASKAP_demo on birch. There's a README there which gives you the command to do things in batch mode. You may want to change my sky model to something tigger-compatible. This failure happens sufficiently often that its annoying, especially if its near the end of a 6 hr job!
Created an attachment (id=73) a dump out of a batch processing failure
This random failure during or at the end of batch processing keeps creeping up!
That's interesting, because it resolutely refuses to happen with the sims I'm doing over here. I'm copying your data over to Cape Town to give it a try (I presume the same directory on birch is still good?), to see if it's perhaps also dependent on Linux variant. If not, then it must be related to some MeqTrees feature in your simulation that I'm not using in mine, which is at least a data point. I shall keep looking, anyway.
* Bug 764 has been marked as a duplicate of this bug. *
* Bug 573 has been marked as a duplicate of this bug. *
OK, I think I have this licked in the current version (r8286). Serious stress- testing of Tony's code has yet to yield a crash. Tony: please do some testing.
There is still an underlying problem (bug 576) that I have at best worked around, not properly fixed. But a real fix is too complicated at this stage, so we'll have to leave it until the next release cycle. I'm therefore downgrading this bug, and taking off the release milestone.
at 2011-02-21 23:31:48 Tony Willis reported:
Occasional failure of meqserver/python in batch mode