marbl / metAMOS

A metagenomic and isolate assembly and analysis pipeline built with AMOS
http://marbl.github.io/metAMOS
Other
93 stars 45 forks source link

FindORFs error #75

Closed ewilbanks closed 11 years ago

ewilbanks commented 11 years ago

Hi folks,

I'm having an issue running metamos on our linux 64bit machine. Running the following command, my run died after chugging away at FragGeneScan for several days. The command I ran was: ~/software/metAMOS/runPipeline -v -p 8 -n Assemble,FindRepeats -d /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1

I'm wondering if the error is related to this previously noted issue? http://github.com/treangen/metAMOS/issues/53

Output error is below - let me know if any other files or info would be helpful!

|2013-05-05 16:43:08| mv /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.gene.cvg /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.lib2.gene.cvg |2013-05-05 16:43:24| mv /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.gene.map /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.lib2.gene.map |2013-05-05 16:43:39| rm -r /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.fna |2013-05-05 16:49:37| cat /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba*.fna > /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.fna |2013-05-05 16:49:52| rm -r /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.fna.bnk |2013-05-05 16:50:10| /home/ewilbanks/software/metAMOS/AMOS/Linux-x86_64/bin/toAmos_new -s /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.fna -b /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.fna.bnk

Last 10 lines of output (/share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Logs/FINDORFS.log) no. of seqs: 17125850 no. of seqs: 169603267 rm: cannot remove /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.fna': No such file or directory rm: cannot remove/share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FindORFS/out/proba.fna.bnk': No such file or directory Read Bank doesn't exist; creating frag Bank doesn't exist; creating lib Bank doesn't exist; creating parsing fasta file terminate called after throwing an instance of 'AMOS::ArgumentException_t' what(): Cannot insert string key 'M01533:9:000000000-A20UG:1:1101:17058:1598_1260-' multiple times

Please veryify input data and restart MetAMOS. If the problem persists please contact the MetAMOS development team. ****ERROR******


rm: cannot remove `/share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Logs/findorfs.ok': No such file or directory ruffus.ruffus_exceptions.RethrownJobError:

Exception #1
  'exceptions.NameError(global name 'JobSignalledBreak' is not defined)' raised in ...
   Task = def findorfs.FindORFS(...):
   Job  = [proba.asm.contig -> proba.faa]

Traceback (most recent call last):
  File "/home/ewilbanks/software/metAMOS/Utilities/ruffus/task.py", line 616, in run_pooled_job_without_exceptions
    return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
  File "/home/ewilbanks/software/metAMOS/Utilities/ruffus/task.py", line 486, in job_wrapper_io_files
    ret_val = user_defined_work_func(*param)
  File "/home/ewilbanks/software/metAMOS/src/findorfs.py", line 318, in FindORFS
    run_process(_settings, "%s/toAmos_new -s %s/FindORFS/out/%s.fna -b %s/FindORFS/out/%s.fna.bnk"%(_settings.AMOS, _settings.rundir, _settings.PREFIX, _settings.rundir, _settings.PREFIX), "FindORFS")   
  File "/home/ewilbanks/software/metAMOS/src/utils.py", line 608, in run_process
    raise (JobSignalledBreak)
NameError: global name 'JobSignalledBreak' is not defined
skoren commented 11 years ago

Hi,

This looks like an error with the naming of the sequences. If you enable the filtering option with the -t option to runPipeline, it should rename all the reads and eliminate the error. Unfortunately, this means you need to re-run the pipeline from scratch. As for FindORFS, currently, metAMOS will try to call ORFs on all sequences that could not be mapped to your assembly. However, this is time consuming for large datasets. Thus, we are planning to update the code to disable gene calling on the sequences by default. The change should be available in the next week, you can also disable ORF calling on unmapped sequences yourself by removing lines 310-312 in src/findorfs.py.

for lib in _readlibs:
  run_process(_settings, "ln -s %s/Assemble/out/lib%d.unaligned.fasta %s/FindORFS/in/"%(_settings.rundir,lib.id,_settings.rundir),"FindORFS")
  findFastaORFs(_orf, "%s/FindORFS/in/lib%d.unaligned.fasta"%(_settings.rundir, lib.id), "%s.lib%d.fna"%(_settings.PREFIX, lib.id), "%s.lib%d.faa"%(_settings.PREFIX, lib.id), "%s.lib%d.gene.cvg"%(_settings.PREFIX, lib.id), "%s.lib%d.gene.map"%(_settings.PREFIX, lib.id), 0, 1)

Sergey

ewilbanks commented 11 years ago

OK, thanks! What exactly does -t option do? I couldn't really tell from the documentation I read through.

ewilbanks commented 11 years ago

Re-ran including -t and got a similar error

$ ~/software/metAMOS/runPipeline -v -t -p 14 -n Assemble,FindRepeats,FindORFS,Annotate,FunctionalAnnotation,Classify,Propagate -d /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1

*\ metAMOS running command: rm -rf /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Scaffold/in/proba.bnk

*\ metAMOS running command: /home/ewilbanks/software/metAMOS/AMOS/Linux-x86_64/bin/toAmos_new -Q /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Preprocess/out/lib1.seq -i --min 200 --max 1000 --libname lib1 -b /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Scaffold/in/proba.bnk


****ERROR****** During scaffold, the following command failed with return code -6:

/home/ewilbanks/software/metAMOS/AMOS/Linux-x86_64/bin/toAmos_new -Q /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Preprocess/out/lib1.seq -i --min 200 --max 1000 --libname lib1 -b /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Scaffold/in/proba.bnk

****DETAILS****** Last 10 commands run before the error (/share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Logs/COMMANDS.log) |2013-05-07 10:59:45| touch /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Logs/findrepeats.skip |2013-05-07 10:59:46|# [ANNOTATE] |2013-05-07 11:00:01| touch /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Logs/annotate.skip |2013-05-07 11:00:17| touch /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Annotate/out/proba.hits |2013-05-07 11:00:18|# [FUNCTIONALANNOTATION] |2013-05-07 11:00:35| touch /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Logs/functionalannotation.skip |2013-05-07 11:00:54| touch /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/FunctionalAnnotation/out/blast.out |2013-05-07 11:00:55|# [SCAFFOLD] |2013-05-07 11:01:16| rm -rf /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Scaffold/in/proba.bnk |2013-05-07 11:01:38| /home/ewilbanks/software/metAMOS/AMOS/Linux-x86_64/bin/toAmos_new -Q /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Preprocess/out/lib1.seq -i --min 200 --max 1000 --libname lib1 -b /share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Scaffold/in/proba.bnk

Last 10 lines of output (/share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Logs/SCAFFOLD.log) Read Bank doesn't exist; creating frag Bank doesn't exist; creating lib Bank doesn't exist; creating parsing fastq file terminate called after throwing an instance of 'AMOS::ArgumentException_t' what(): Cannot insert string key 'M01533:9:000000000-A20UG:1:1101:16050:1572' multiple times

Please veryify input data and restart MetAMOS. If the problem persists please contact the MetAMOS development team. ****ERROR******


rm: cannot remove `/share/eisen-z2/ewilbanks/Moleculo/metamos/mol.celera1/Logs/scaffold.ok': No such file or directory ruffus.ruffus_exceptions.RethrownJobError:

Exception #1
  'exceptions.NameError(global name 'JobSignalledBreak' is not defined)' raised in ...
   Task = def scaffold.Scaffold(...):
   Job  = [[proba.asm.contig] -> proba.scaffolds.final]

Traceback (most recent call last):
  File "/home/ewilbanks/software/metAMOS/Utilities/ruffus/task.py", line 616, in run_pooled_job_without_exceptions
    return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
  File "/home/ewilbanks/software/metAMOS/Utilities/ruffus/task.py", line 486, in job_wrapper_io_files
    ret_val = user_defined_work_func(*param)
  File "/home/ewilbanks/software/metAMOS/src/scaffold.py", line 71, in Scaffold
    run_process(_settings, "%s/toAmos_new -Q %s/Preprocess/out/lib%d.seq %s -b %s/Scaffold/in/%s.bnk "%(_settings.AMOS,_settings.rundir,lib.id,matedStr,_settings.rundir,_settings.PREFIX),"Scaffold")
  File "/home/ewilbanks/software/metAMOS/src/utils.py", line 608, in run_process
    raise (JobSignalledBreak)
NameError: global name 'JobSignalledBreak' is not defined
skoren commented 11 years ago

Hi,

Sorry I didn't specify this in my previous response. The -t option will filter sequences containing Ns and rename the sequences to a standard naming convention. Lots of the tools within metAMOS don't support arbitrary names so this filtering helps avoid errors. My guess is that your read name is of the form SEQUENCE_ID 1 or SEQUENCE_ID 2, depending on which paired end it is. AMOS does not understand spaces in the sequence names so SEQUENCE_ID 1 and SEQUENCE_ID 2 look like the same thing. The filter step would rename these to be SEQUENCE_ID/1 and SEQUENCE_ID/2.

The error you got above is because metAMOS tries to resume a previously started run. You need to either remove your working directory (mol.celera1) and re-create it with initPipeline or force Preprocess (-f Preprocess) to make the filtering run. This will re-start metAMOS from scratch.

skoren commented 11 years ago

Closing due to inactivity