marbl / metAMOS

A metagenomic and isolate assembly and analysis pipeline built with AMOS
http://marbl.github.io/metAMOS
Other
93 stars 45 forks source link

Help debugging failed run #45

Closed dbrami closed 12 years ago

dbrami commented 12 years ago

Hi folks,

I was happy to get the pipeline to run end to end on sub-sampling of my data to 10M paired reads. I then attempted to run it on the entire data set of 76M paired reads. It unfortunately crashed at the findORFS step.

The error log is at bottom of this message. Here are my additional questions:

Here is the command used: ${metAMOS}/runPipeline -c amphora2 -d METAMOS_BS27FULL -g fraggenescan -k 43 -p 22 -a velvet 1> METAMOS_BS27FULL.run.out 2> METAMOS_BS27FULL.run.err &

Her is the STDERR log:

Job = [[SGI_BS27.1.fastq, SGI_BS27.2.fastq] -> preprocess.success] completed

Completed Task = preprocess.Preprocess Job = [[lib1.seq] -> [proba.asm.contig]] completed Completed Task = assemble.Assemble Job = [proba.asm.contig -> proba.bout] completed Completed Task = mapreads.MapReads Traceback (most recent call last): File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/runPipeline", line 367, in pipeline_run([preprocess.Preprocess,assemble.Assemble,findorfs.FindORFS, findreps.FindRepeats, annotate.Annotate, abundance.Abundance, scaffold.Scaffold, findscforfs.FindScaffoldORFS, propagate.Propagate, classify.Classify, postprocess.Postprocess], verbose = 1) File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 2680, in pipeline_run raise errt ruffus.ruffus_exceptions.RethrownJobError:

Exceptions running jobs for

'def findorfs.FindORFS(...):'

Original exception:

Exception #1
exceptions.ValueError(need more than 1 value to unpack):
for findorfs.FindORFS.Job = [proba.asm.contig -> proba.faa]

Traceback (most recent call last):
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 524, in run_pooled_job_without_exceptions
    return t_job_result(task_name, JOB_COMPLETED, job_name, return_value, None)
  File "/bio_bin/python26/lib/python2.6/contextlib.py", line 34, in __exit__
    self.gen.throw(type, value, traceback)
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 232, in do_nothing_semaphore
    yield
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 517, in run_pooled_job_without_exceptions
    return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 447, in job_wrapper_io_files
    ret_val = user_defined_work_func(*param)
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/src/findorfs.py", line 243, in FindORFS
    parse_fraggenescanout("%s/FindORFS/out/%s.orfs"%(_settings.rundir,_settings.PREFIX))
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/src/findorfs.py", line 191, in parse_fraggenescanout
    hdr,gene = seq.split("\n",1)
ValueError: need more than 1 value to unpack

Thanks

treangen commented 12 years ago

hi Daniel,

looks like there is a bug in parsing of FragGeneScan output. I haven't encountered this before, my best guess is FragGeneScan crashed and metAMOS tried to parse an empty file and failed. either way we should handle this better. that said, could you please copy+paste the FINDORFS log file in the ./Logs directory? Thanks!

best,

Todd

Hi folks,

I was happy to get the pipeline to run end to end on sub-sampling of my data to 10M paired reads. I then attempted to run it on the entire data set of 76M paired reads. It unfortunately crashed at the findORFS step.

The error log is at bottom of this message. Here are my additional questions:

  • Is there a flag for printing the list of commands to a file or to STDOUT / STDERR ?
  • what does the --fastest flag do specifically?

Here is the command used: ${metAMOS}/runPipeline -c amphora2 -d METAMOS_BS27FULL -g fraggenescan -k 43 -p 22 -a velvet 1> METAMOS_BS27FULL.run.out 2> METAMOS_BS27FULL.run.err&

Her is the STDERR log:

 Job = [[SGI_BS27.1.fastq, SGI_BS27.2.fastq] ->  preprocess.success] completed

Completed Task = preprocess.Preprocess Job = [[lib1.seq] -> [proba.asm.contig]] completed Completed Task = assemble.Assemble Job = [proba.asm.contig -> proba.bout] completed Completed Task = mapreads.MapReads Traceback (most recent call last): File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/runPipeline", line 367, in pipeline_run([preprocess.Preprocess,assemble.Assemble,findorfs.FindORFS, findreps.FindRepeats, annotate.Annotate, abundance.Abundance, scaffold.Scaffold, findscforfs.FindScaffoldORFS, propagate.Propagate, classify.Classify, postprocess.Postprocess], verbose = 1) File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 2680, in pipeline_run raise errt ruffus.ruffus_exceptions.RethrownJobError:

 Exceptions running jobs for

 'def findorfs.FindORFS(...):'

 Original exception:

 Exception #1
 exceptions.ValueError(need more than 1 value to unpack):
 for findorfs.FindORFS.Job = [proba.asm.contig ->  proba.faa]

 Traceback (most recent call last):
   File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 524, in run_pooled_job_without_exceptions
     return t_job_result(task_name, JOB_COMPLETED, job_name, return_value, None)
   File "/bio_bin/python26/lib/python2.6/contextlib.py", line 34, in __exit__
     self.gen.throw(type, value, traceback)
   File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 232, in do_nothing_semaphore
     yield
   File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 517, in run_pooled_job_without_exceptions
     return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
   File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 447, in job_wrapper_io_files
     ret_val = user_defined_work_func(*param)
   File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/src/findorfs.py", line 243, in FindORFS
     parse_fraggenescanout("%s/FindORFS/out/%s.orfs"%(_settings.rundir,_settings.PREFIX))
   File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/src/findorfs.py", line 191, in parse_fraggenescanout
     hdr,gene = seq.split("\n",1)
 ValueError: need more than 1 value to unpack

Thanks


Reply to this email directly or view it on GitHub: https://github.com/treangen/metAMOS/issues/45

dbrami commented 12 years ago

Thanks for quick response - i have encountered this problem a few times before and Ido recall you warning me about the stability of FragGeneScan in this pipeline; Here is the content of the FINDORFS.log:

unlink: cannot unlink `/home/dbrami/tmp/metAmos/METAMOS_BS27FULL/FindORFS/in/proba.asm.contig': No such file or directory

Here is the content of the findORFS folder:

FindORFS/ |-- in | -- proba.asm.contig -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/proba.asm.contig -- out |-- proba.gene.cvg |-- proba.orfs |-- proba.orfs.faa `-- proba.orfs.ffn

cmd-> ls -lstrh FindORFS/out/ total 821M 463M -rw-rw-r-- 1 dbrami employees 462M Apr 11 21:38 proba.orfs.ffn 221M -rw-rw-r-- 1 dbrami employees 221M Apr 11 21:38 proba.orfs.faa 139M -rw-rw-r-- 1 dbrami employees 139M Apr 11 21:38 proba.orfs 0 -rw-rw-r-- 1 dbrami employees 0 Apr 11 21:38 proba.gene.cvg

And for good measure the tree of the Assembly folder:

Assemble/ |-- in -- out |-- Graph2 |-- IDX.1.ebwt |-- IDX.2.ebwt |-- IDX.3.ebwt |-- IDX.4.ebwt |-- IDX.rev.1.ebwt |-- IDX.rev.2.ebwt |-- LastGraph |-- Log |-- PreGraph |-- Roadmaps |-- Sequences |-- contigs.fa |-- contigs_wo_location_info.txt |-- proba.afg -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/velvet_asm.afg |-- proba.asm.contig -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/contigs.fa |-- proba.asm.tigr |-- proba.bout |-- proba.contig.cvg |-- proba.lib1.badmates |-- proba.lib1.hdr |-- proba.lib1.mappedmates |-- proba.lib1.mates_in_diff_contigs |-- proba.seq100.contig |-- stats.txt -- velvet_asm.afg

treangen commented 12 years ago

Thanks for the addl info. so it looks like a FragGeneScan parsing error since the files are not empty. would you mind sending me the files or making them available via FTP so that I may debug this? The files I need are the ones in :

FindORFS/out/

best,

Todd

On Thu, Apr 12, 2012 at 1:26 PM, dbrami < reply@reply.github.com

wrote:

Thanks for quick response - i have encountered this problem a few times before and Ido recall you warning me about the stability of FragGeneScan in this pipeline; Here is the content of the FINDORFS.log:

unlink: cannot unlink `/home/dbrami/tmp/metAmos/METAMOS_BS27FULL/FindORFS/in/proba.asm.contig': No such file or directory

Here is the content of the findORFS folder:

FindORFS/ |-- in | -- proba.asm.contig -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/proba.asm.contig -- out |-- proba.gene.cvg |-- proba.orfs |-- proba.orfs.faa `-- proba.orfs.ffn

cmd-> ls -lstrh FindORFS/out/ total 821M 463M -rw-rw-r-- 1 dbrami employees 462M Apr 11 21:38 proba.orfs.ffn 221M -rw-rw-r-- 1 dbrami employees 221M Apr 11 21:38 proba.orfs.faa 139M -rw-rw-r-- 1 dbrami employees 139M Apr 11 21:38 proba.orfs 0 -rw-rw-r-- 1 dbrami employees 0 Apr 11 21:38 proba.gene.cvg

And for good measure the tree of the Assembly folder:

Assemble/ |-- in -- out |-- Graph2 |-- IDX.1.ebwt |-- IDX.2.ebwt |-- IDX.3.ebwt |-- IDX.4.ebwt |-- IDX.rev.1.ebwt |-- IDX.rev.2.ebwt |-- LastGraph |-- Log |-- PreGraph |-- Roadmaps |-- Sequences |-- contigs.fa |-- contigs_wo_location_info.txt |-- proba.afg -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/velvet_asm.afg |-- proba.asm.contig -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/contigs.fa |-- proba.asm.tigr |-- proba.bout |-- proba.contig.cvg |-- proba.lib1.badmates |-- proba.lib1.hdr |-- proba.lib1.mappedmates |-- proba.lib1.mates_in_diff_contigs |-- proba.seq100.contig |-- stats.txt -- velvet_asm.afg


Reply to this email directly or view it on GitHub: https://github.com/treangen/metAMOS/issues/45#issuecomment-5096142

Todd J. Treangen, Ph.D. Postdoctoral Fellow McKusick-Nathans Institute of Genetic Medicine Johns Hopkins University School of Medicine Office: Bloomberg School of Public Health, E3138 615 N Wolfe St, Baltimore MD, 21205 Phone: 443-287-8782, FAX: 410-955-0958 Email: treangen@jhmi.edu

dbrami commented 12 years ago

Thanks Todd,

Since I am working with sensitive data, the higher ups are uneasy about me sending anything. Let me attempt to re-run the pipeline again starting from the FindORF step.

Also, I dont have access to an FTP site where I can easily drop the data.

But I can say this, the crash seemed to have been very abrupt; notice the sequence ID has been truncated:

cmd-> tail proba.orfs

NODE_3889587_length_93_cov_1.021505 1 135 - 2 1.247610 I: D: NODE_3889589_length_88_cov_1.340909 1 130 + 1 1.333019 I: D: NODE_3889590_length_107_cov_1.102804 1 149 + 2 1.385543 I: D: NODE_3889595_length_117_cov_1.008547 NODE_3889596_length_43_cov_1.651163 1 85 + 2 1.269308 I: D: NODE_3889599_len

same with the other two files: cmd-> tail proba.orfs.faa PDRKQASQIDRYRLVIVDECSMINEELW

NODE_3889557_length_43_cov_1.000000_185- ASLMAPSLLDRVFLTRSKKRKADDEIQ NODE_3889558_length_43_cov_1.000000_185- DGKKKTPKSVCPDGWSDFKNSLWARFST NODE_3889559_length_43_cov_1.000000_185+ TRQPLYNRQTIAHPGWTREAIRPSVRV NODE_3889561_length_43_cov_1.000000_185- EMCGNGIRCMAKFSEALETQDGQPPQA NODE_3889562_length_43cov

cmd-> tail proba.orfs.ffn

NODE_3889575_length_43_cov_1.000000_185+ TGGGTTAAAAAACGATACATTGATCCTGCACCCCCCAACAAAGGCTTTCAAGGTGCTGATGGAATCTCGCGGAAATTCATC NODE_3889576_length_43_cov_1.0

skoren commented 12 years ago

The latest code in the repository includes a verbose option (-v) to runPipeline that will print every command run to the stdout. Additionally, there is a file named Logs/COMMANDS.txt that lists every COMMAND run for each step of the pipeline.This issue should be addressed.