Closed dbrami closed 12 years ago
hi Daniel,
looks like there is a bug in parsing of FragGeneScan output. I haven't encountered this before, my best guess is FragGeneScan crashed and metAMOS tried to parse an empty file and failed. either way we should handle this better. that said, could you please copy+paste the FINDORFS log file in the ./Logs directory? Thanks!
best,
Todd
Hi folks,
I was happy to get the pipeline to run end to end on sub-sampling of my data to 10M paired reads. I then attempted to run it on the entire data set of 76M paired reads. It unfortunately crashed at the findORFS step.
The error log is at bottom of this message. Here are my additional questions:
- Is there a flag for printing the list of commands to a file or to STDOUT / STDERR ?
- what does the --fastest flag do specifically?
Here is the command used: ${metAMOS}/runPipeline -c amphora2 -d METAMOS_BS27FULL -g fraggenescan -k 43 -p 22 -a velvet 1> METAMOS_BS27FULL.run.out 2> METAMOS_BS27FULL.run.err&
Her is the STDERR log:
Job = [[SGI_BS27.1.fastq, SGI_BS27.2.fastq] -> preprocess.success] completed
Completed Task = preprocess.Preprocess Job = [[lib1.seq] -> [proba.asm.contig]] completed Completed Task = assemble.Assemble Job = [proba.asm.contig -> proba.bout] completed Completed Task = mapreads.MapReads Traceback (most recent call last): File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/runPipeline", line 367, in
pipeline_run([preprocess.Preprocess,assemble.Assemble,findorfs.FindORFS, findreps.FindRepeats, annotate.Annotate, abundance.Abundance, scaffold.Scaffold, findscforfs.FindScaffoldORFS, propagate.Propagate, classify.Classify, postprocess.Postprocess], verbose = 1) File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 2680, in pipeline_run raise errt ruffus.ruffus_exceptions.RethrownJobError: Exceptions running jobs for 'def findorfs.FindORFS(...):' Original exception: Exception #1 exceptions.ValueError(need more than 1 value to unpack): for findorfs.FindORFS.Job = [proba.asm.contig -> proba.faa] Traceback (most recent call last): File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 524, in run_pooled_job_without_exceptions return t_job_result(task_name, JOB_COMPLETED, job_name, return_value, None) File "/bio_bin/python26/lib/python2.6/contextlib.py", line 34, in __exit__ self.gen.throw(type, value, traceback) File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 232, in do_nothing_semaphore yield File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 517, in run_pooled_job_without_exceptions return_value = job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only) File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 447, in job_wrapper_io_files ret_val = user_defined_work_func(*param) File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/src/findorfs.py", line 243, in FindORFS parse_fraggenescanout("%s/FindORFS/out/%s.orfs"%(_settings.rundir,_settings.PREFIX)) File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/src/findorfs.py", line 191, in parse_fraggenescanout hdr,gene = seq.split("\n",1) ValueError: need more than 1 value to unpack
Thanks
Reply to this email directly or view it on GitHub: https://github.com/treangen/metAMOS/issues/45
Thanks for quick response - i have encountered this problem a few times before and Ido recall you warning me about the stability of FragGeneScan in this pipeline; Here is the content of the FINDORFS.log:
unlink: cannot unlink `/home/dbrami/tmp/metAmos/METAMOS_BS27FULL/FindORFS/in/proba.asm.contig': No such file or directory
Here is the content of the findORFS folder:
FindORFS/
|-- in
| -- proba.asm.contig -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/proba.asm.contig
-- out
|-- proba.gene.cvg
|-- proba.orfs
|-- proba.orfs.faa
`-- proba.orfs.ffn
cmd-> ls -lstrh FindORFS/out/ total 821M 463M -rw-rw-r-- 1 dbrami employees 462M Apr 11 21:38 proba.orfs.ffn 221M -rw-rw-r-- 1 dbrami employees 221M Apr 11 21:38 proba.orfs.faa 139M -rw-rw-r-- 1 dbrami employees 139M Apr 11 21:38 proba.orfs 0 -rw-rw-r-- 1 dbrami employees 0 Apr 11 21:38 proba.gene.cvg
And for good measure the tree of the Assembly folder:
Assemble/
|-- in
-- out |-- Graph2 |-- IDX.1.ebwt |-- IDX.2.ebwt |-- IDX.3.ebwt |-- IDX.4.ebwt |-- IDX.rev.1.ebwt |-- IDX.rev.2.ebwt |-- LastGraph |-- Log |-- PreGraph |-- Roadmaps |-- Sequences |-- contigs.fa |-- contigs_wo_location_info.txt |-- proba.afg -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/velvet_asm.afg |-- proba.asm.contig -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/contigs.fa |-- proba.asm.tigr |-- proba.bout |-- proba.contig.cvg |-- proba.lib1.badmates |-- proba.lib1.hdr |-- proba.lib1.mappedmates |-- proba.lib1.mates_in_diff_contigs |-- proba.seq100.contig |-- stats.txt
-- velvet_asm.afg
Thanks for the addl info. so it looks like a FragGeneScan parsing error since the files are not empty. would you mind sending me the files or making them available via FTP so that I may debug this? The files I need are the ones in :
FindORFS/out/
best,
Todd
On Thu, Apr 12, 2012 at 1:26 PM, dbrami < reply@reply.github.com
wrote:
Thanks for quick response - i have encountered this problem a few times before and Ido recall you warning me about the stability of FragGeneScan in this pipeline; Here is the content of the FINDORFS.log:
unlink: cannot unlink `/home/dbrami/tmp/metAmos/METAMOS_BS27FULL/FindORFS/in/proba.asm.contig': No such file or directory
Here is the content of the findORFS folder:
FindORFS/ |-- in |
-- proba.asm.contig -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/proba.asm.contig
-- out |-- proba.gene.cvg |-- proba.orfs |-- proba.orfs.faa `-- proba.orfs.ffncmd-> ls -lstrh FindORFS/out/ total 821M 463M -rw-rw-r-- 1 dbrami employees 462M Apr 11 21:38 proba.orfs.ffn 221M -rw-rw-r-- 1 dbrami employees 221M Apr 11 21:38 proba.orfs.faa 139M -rw-rw-r-- 1 dbrami employees 139M Apr 11 21:38 proba.orfs 0 -rw-rw-r-- 1 dbrami employees 0 Apr 11 21:38 proba.gene.cvg
And for good measure the tree of the Assembly folder:
Assemble/ |-- in
-- out |-- Graph2 |-- IDX.1.ebwt |-- IDX.2.ebwt |-- IDX.3.ebwt |-- IDX.4.ebwt |-- IDX.rev.1.ebwt |-- IDX.rev.2.ebwt |-- LastGraph |-- Log |-- PreGraph |-- Roadmaps |-- Sequences |-- contigs.fa |-- contigs_wo_location_info.txt |-- proba.afg -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/velvet_asm.afg |-- proba.asm.contig -> /home/dbrami/tmp/metAmos/METAMOS_BS27FULL/Assemble/out/contigs.fa |-- proba.asm.tigr |-- proba.bout |-- proba.contig.cvg |-- proba.lib1.badmates |-- proba.lib1.hdr |-- proba.lib1.mappedmates |-- proba.lib1.mates_in_diff_contigs |-- proba.seq100.contig |-- stats.txt
-- velvet_asm.afg
Reply to this email directly or view it on GitHub: https://github.com/treangen/metAMOS/issues/45#issuecomment-5096142
Todd J. Treangen, Ph.D. Postdoctoral Fellow McKusick-Nathans Institute of Genetic Medicine Johns Hopkins University School of Medicine Office: Bloomberg School of Public Health, E3138 615 N Wolfe St, Baltimore MD, 21205 Phone: 443-287-8782, FAX: 410-955-0958 Email: treangen@jhmi.edu
Thanks Todd,
Since I am working with sensitive data, the higher ups are uneasy about me sending anything. Let me attempt to re-run the pipeline again starting from the FindORF step.
Also, I dont have access to an FTP site where I can easily drop the data.
But I can say this, the crash seemed to have been very abrupt; notice the sequence ID has been truncated:
cmd-> tail proba.orfs
NODE_3889587_length_93_cov_1.021505 1 135 - 2 1.247610 I: D: NODE_3889589_length_88_cov_1.340909 1 130 + 1 1.333019 I: D: NODE_3889590_length_107_cov_1.102804 1 149 + 2 1.385543 I: D: NODE_3889595_length_117_cov_1.008547 NODE_3889596_length_43_cov_1.651163 1 85 + 2 1.269308 I: D: NODE_3889599_len
same with the other two files: cmd-> tail proba.orfs.faa PDRKQASQIDRYRLVIVDECSMINEELW
NODE_3889557_length_43_cov_1.000000_185- ASLMAPSLLDRVFLTRSKKRKADDEIQ NODE_3889558_length_43_cov_1.000000_185- DGKKKTPKSVCPDGWSDFKNSLWARFST NODE_3889559_length_43_cov_1.000000_185+ TRQPLYNRQTIAHPGWTREAIRPSVRV NODE_3889561_length_43_cov_1.000000_185- EMCGNGIRCMAKFSEALETQDGQPPQA NODE_3889562_length_43cov
cmd-> tail proba.orfs.ffn
NODE_3889575_length_43_cov_1.000000_185+ TGGGTTAAAAAACGATACATTGATCCTGCACCCCCCAACAAAGGCTTTCAAGGTGCTGATGGAATCTCGCGGAAATTCATC NODE_3889576_length_43_cov_1.0
The latest code in the repository includes a verbose option (-v) to runPipeline that will print every command run to the stdout. Additionally, there is a file named Logs/COMMANDS.txt that lists every COMMAND run for each step of the pipeline.This issue should be addressed.
Hi folks,
I was happy to get the pipeline to run end to end on sub-sampling of my data to 10M paired reads. I then attempted to run it on the entire data set of 76M paired reads. It unfortunately crashed at the findORFS step.
The error log is at bottom of this message. Here are my additional questions:
Here is the command used: ${metAMOS}/runPipeline -c amphora2 -d METAMOS_BS27FULL -g fraggenescan -k 43 -p 22 -a velvet 1> METAMOS_BS27FULL.run.out 2> METAMOS_BS27FULL.run.err &
Her is the STDERR log:
Completed Task = preprocess.Preprocess Job = [[lib1.seq] -> [proba.asm.contig]] completed Completed Task = assemble.Assemble Job = [proba.asm.contig -> proba.bout] completed Completed Task = mapreads.MapReads Traceback (most recent call last): File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/runPipeline", line 367, in
pipeline_run([preprocess.Preprocess,assemble.Assemble,findorfs.FindORFS, findreps.FindRepeats, annotate.Annotate, abundance.Abundance, scaffold.Scaffold, findscforfs.FindScaffoldORFS, propagate.Propagate, classify.Classify, postprocess.Postprocess], verbose = 1)
File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 2680, in pipeline_run
raise errt
ruffus.ruffus_exceptions.RethrownJobError:
Thanks