hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

Segway should not requeue jobs that failed for reasons other than out-of-memory #8

Closed EricR86 closed 8 years ago

EricR86 commented 10 years ago

Original report (BitBucket issue) by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


From Google Code Issue #13

Imported Labels: bug, imported, Priority-Low

From jay.hesselberth on August 18, 2010 12:29:09

I migrated some segway training runs from one file system to another; this changed the root path and so the .inc files referenced by segway.inc and input.master changed. When I initiate an identify run, the gmtkTriangulate task raises and exception and exits the segway run cleanly:

/segway-projects/results/20100812/exp1/segway.str:1:70: error: /segway-results/results/20100812/exp1/auxiliary/segway.inc: No such file or directory Parse Error in file '/segway-projects/results/20100812/exp1/segway.str': expecting variable type at or before line 7, near (TYPE_SEGCOUNTDOWN) Exiting Program Traceback (most recent call last): File "/segway-build/arch/Linux-x86_64/bin/segway", line 8, in load_entry_point('segway==0.2.0', 'console_scripts', 'segway')() File "/segway-build/arch/Linux-x86_64/lib/python2.6/segway-0.2.0-py2.6.egg/segway/run.py", line 3694, in main return runner() File "/segway-build/arch/Linux-x86_64/lib/python2.6/segway-0.2.0-py2.6.egg/segway/run.py", line 3514, in call self.run(*args, kwargs) File "/segway-build/arch/Linux-x86_64/lib/python2.6/segway-0.2.0-py2.6.egg/segway/run.py", line 3487, in run self.run_triangulate() File "/segway-build/arch/Linux-x86_64/lib/python2.6/segway-0.2.0-py2.6.egg/segway/run.py", line 3084, in run_triangulate self.run_triangulate_single(num_segs) File "/segway-build/arch/Linux-x86_64/lib/python2.6/segway-0.2.0-py2.6.egg/segway/run.py", line 3077, in run_triangulate_single prog(kwargs) File "build/bdist.linux-x86_64/egg/optbuild.py", line 78, in call File "build/bdist.linux-x86_64/egg/optbuild.py", line 209, in run File "build/bdist.linux-x86_64/egg/optbuild.py", line 191, in _getoutput File "build/bdist.linux-x86_64/egg/optbuild.py", line 154, in _popen File "build/bdist.linux-x86_64/egg/optbuild.py", line 54, in _returncode_error_factory optbuild.ReturncodeError: /segway-build/arch/Linux-x86_64/bin/gmtkTriangulate returned 1

However, the gmtkViterbi task that uses input.master recognizes the error and raises and exception, but then (apparently) thinks it exited because it ran out of memory and launches another job with increased memory requirements (below; there are multiple of these errors in a row as it keeps trying to launch the same job repeatedly). Rather than doing this, it should exit cleanly as above because the model isn't correctly parsed.

root@ip-10-204-151-58 /segway-projects/results/20100818/exp1/20100818.identify.20100818/output/e/identify

cat vit999.20100818.identify.20100818.196ffc5aaadf11dfbaf712313b0a94cc /segway-build/arch/Linux-x86_64/lib/python2.6/path-2.2-py2.6.egg/path.py:32: DeprecationWarning: the md5 module is deprecated; use hashlib instead /segway-build/arch/Linux-x86_64/lib/python2.6/optbuild-0.1.7-py2.6.egg/optbuild.py:294: DeprecationWarning: object.new() takes no parameters /segway-build/arch/Linux-x86_64/lib/python2.6/tables/leaf.py:415: PerformanceWarning: The Leaf /supercontig_5/continuous is exceeding the maximum recommended rowsize (32000000 bytes); be ready to see PyTables asking for lots of memory and possibly slow I/O. You may want to reduce the rowsize by trimming the value of dimensions that are orthogonal (and preferably close) to the main dimension of this leave. Alternatively, in case you have specified a very small/large chunksize, you may want to increase/decrease it. PerformanceWarning) /segway-projects/results/20100812/exp1/params/input.master:1:70: error: /segway-results/results/20100812/exp1/auxiliary/segway.inc: No such file or directory ERROR: In file '/segway-projects/results/20100812/exp1/params/input.master' line 18, DT 'map_frameIndex_ruler', equation 'mod(p0, RULER_SCALE) == 0 ': Invalid symbol at 'ruler_scale)==0' Traceback (most recent call last): File "/segway-build/arch/Linux-x86_64/bin/segway-task", line 8, in load_entry_point('segway==0.2.0', 'console_scripts', 'segway-task')() File "/segway-build/arch/Linux-x86_64/lib/python2.6/segway-0.2.0-py2.6.egg/segway/task.py", line 242, in main return task(args) File "/segway-build/arch/Linux-x86_64/lib/python2.6/segway-0.2.0-py2.6.egg/segway/task.py", line 234, in task TASKS[verb, kind]((chrom, start, end), resolution, outfilename, args) File "/segway-build/arch/Linux-x86_64/lib/python2.6/segway-0.2.0-py2.6.egg/segway/task.py", line 211, in run_viterbi_save_bed output = VITERBI_PROG.getoutput(*args) File "build/bdist.linux-x86_64/egg/optbuild.py", line 203, in getoutput File "build/bdist.linux-x86_64/egg/optbuild.py", line 191, in _getoutput File "build/bdist.linux-x86_64/egg/optbuild.py", line 154, in _popen File "build/bdist.linux-x86_64/egg/optbuild.py", line 54, in _returncode_error_factory optbuild.ReturncodeError: /segway-build/arch/Linux-x86_64/bin/gmtkViterbi returned 1

Original issue: http://code.google.com/p/segway-genome/issues/detail?id=13

EricR86 commented 10 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


From hoffman...@gmail.com on April 12, 2011 13:46:16

There is not currently a fool-proof way to tell that GMTK ran out of memory, which I need for other fixes like this. I am discussing this with the GMTK people--it looks like one might exist soon.

Summary: Segway should not requeue jobs that failed for reasons other than out-of-memory
Status: Accepted
Labels: -Priority-Medium Priority-Low

EricR86 commented 8 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Resolved in Pull Request #36. Currently jobs are resubmitted once more if it is not a memory error.