marbl / metAMOS

A metagenomic and isolate assembly and analysis pipeline built with AMOS
http://marbl.github.io/metAMOS
Other
93 stars 45 forks source link

Error using test_ima.sh #246

Closed noellenoyes closed 8 years ago

noellenoyes commented 8 years ago

I am running into errors when using the test_ima.sh script to test my installation of iMetAMOS. The errors read:

Error, provided contig file does not exist: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Candidatus_Carsonella_ruddii_uid58773/NC_008512.fna project dir /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Test/test_ima does not exist!

I have successfully installed and tested the metAMOS installation, but none of the other optional workflows. Note that the installation of iMetAMOS seemed to go smoothly after I followed the instructions on Issue #242 (https://github.com/marbl/metAMOS/issues/242).

Here is stdout when I run test_ima.sh

Warning: Celera Assembler is not found, some functionality will not be available Warning: BLASR is not found, some functionality will not be available Warning: Newbler is not found, some functionality will not be available Warning: MetaGeneMark is not found, some functionality will not be available Warning: SignalP+ is not found, some functionality will not be available Warning: metaphylerClassify is not found, some functionality will not be available Warning: PhyloSift was not found, will not be available

Warning: REAPR is not found, some functionality will not be available Warning: FRCbam is not found, some functionality will not be available Warning: MPI is not available, some functionality may not be available Error, provided contig file does not exist: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Candidatus_Carsonella_ruddii_uid58773/NC_008512.fna project dir /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Test/test_ima does not exist! usage: runPipeline [options] -d projectdir -h = : print help [this message] -j = : just output all of the programs and citations then exit (default = NO) -v = : verbose output? (default = NO) -d = : directory created by initPipeline (REQUIRED)

[options]: [pipeline_opts] [misc_opts]

[pipeline_opts]: options that affect the pipeline execution Pipeline consists of the following steps: Preprocess, Assemble, FindORFS, MapReads, Abundance, Annotate, FunctionalAnnotation, Scaffold, Propagate, Classify, Postprocess Each of these steps can be referred to by the following options: -f = : force this step to be run (default = NONE) -s = : start at this step in the pipeline (default = Preprocess) -e = : end at this step in the pipeline (default = Postprocess) -n = : step to skip in pipeline (default=NONE)

For each step you can fine-tune the execution as follows [Preprocess] -t = : filter input reads? (default = metamos, supported = none,metamos,eautils,pbcr) -q = : produce FastQC quality report for reads with quality information (fastq or sff)? (default = NO) [Assemble] -a = : genome assembler to use (default = soapdenovo, supported = newbler,soapdenovo,soapdenovo2,ca,velvet,velvet-sc,metavelvet,metaidba,sparseassembler,minimus,abyss,edena,spades,mira,sga,idba-ud,ray,masurca) -k = : k-mer size to be used for assembly (default = 31) -o = >: min overlap length [MapReads] -m = : read mapper to use? (default = bowtie, supported = bowtie,bowtie2) -i = : save bowtie (i)ndex? (default = NO) -b = : create library specific per bp coverage of assembled contigs (default = NO) [FindORFS] -g = : gene caller to use (default = fraggenescan, supported = fraggenescan,metagenemark,glimmermg) -l = : min contig length to use for ORF call (default = 300) -x = >: min contig coverage to use for ORF call (default = 3X) [Validate] -X = : comma-separated list of validators to run on the assembly. (default = lap, supported = reapr,orf,lap,ale,quast,frcbam,freebayes,cgal,n50) -S = : comma-separated list of scores to use to select the winning assembly. By default, all validation tools specified by -X will be run. For each score, an optional weight can be specified as SCORE:WEIGHT. For example, LAP:1,CGAL:2 (supported = all,lap,ale,cgal,snp,frcbam,orf,reapr,n50) [Annotate] -c = : classifier to use for annotation (default = kraken, supported = fcp,phylosift,phmmer,blast,metaphyler,phymm,kraken -u = : annotate unassembled reads? (default = NO) [Classify] -z = : taxonomic level to categorize at (default = class)

[misc_opts]: Miscellaneous options -B = : blast DBs not available (default = NO) -r = : retain the AMOS bank? (default = NO) -p = : number of threads to use (be greedy!) (default=1) -4 = : 454 data? (default = NO) -L = : generate local Krona plots. Local Krona plots can only be viewed on the machine they are generated on but will work on a system with no internet connection (default = NO)

Perhaps this is linked to the ftp problem referenced in Issue #242? Any help you can provide would be much appreciated!

skoren commented 8 years ago

NCBI has completely changed their FTP site invalidating that link. If you change test_ima.ini from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Candidatus_Carsonella_ruddii_uid58773/NC_008512.fna to ftp://ftp.ncbi.nih.gov/genomes/all/GCA_000010365.1_ASM1036v1/GCA_000010365.1_ASM1036v1_genomic.fna.gz it should work.

noellenoyes commented 8 years ago

Thanks for the quick response, that seemed to fix the original error. However, I think that changing the file name is causing an additional error:

Oops, MetAMOS finished with errors! see text in red above for details. Traceback (most recent call last): File "../runPipeline", line 985, in verbose = 1) File "/home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/ruffus/task.py", line 2965, in pipeline_run raise job_errors RethrownJobError:

Exception #1
  'exceptions.ValueError(too many values to unpack)' raised in ...
   Task = def assemble.Assemble(...):
   Job  = [GCA_000010365.1_ASM1036v1_genomic.fna.run -> GCA_000010365.1_ASM1036v1_genomic.fna.asm.contig]

Traceback (most recent call last):
  File "/home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/ruffus/task.py", line 625, in run_pooled_job_without_exceptions
    return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
  File "/home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/ruffus/task.py", line 491, in job_wrapper_io_files
    ret_val = user_defined_work_func(*param)
  File "/home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/src/assemble.py", line 326, in Assemble
    (asmName, kmer) = asmName.split(".")
ValueError: too many values to unpack

Here is the entire output of the test_ima.sh run:

Warning: Celera Assembler is not found, some functionality will not be available Warning: BLASR is not found, some functionality will not be available Warning: Newbler is not found, some functionality will not be available Warning: MetaGeneMark is not found, some functionality will not be available Warning: SignalP+ is not found, some functionality will not be available Warning: metaphylerClassify is not found, some functionality will not be available Warning: PhyloSift was not found, will not be available

Warning: REAPR is not found, some functionality will not be available Warning: FRCbam is not found, some functionality will not be available Warning: MPI is not available, some functionality may not be available Project directory already exists, please specify another Alternatively, use runPipeline to run an existing project

Starting metAMOS pipeline Found pysam in /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/python/lib/python/pysam-0.6-py2.7-linux-x86_64.egg/pysam/init.pyc Found psutil in /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/python/lib/python/psutil-0.6.1-py2.7-linux-x86_64.egg/psutil/init.pyc Error: cannot find BLAST DB directory, expected it in /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/DB/. Disabling blastdb dependent programs Warning: Celera Assembler is not found, some functionality will not be available Warning: BLASR is not found, some functionality will not be available Warning: Newbler is not found, some functionality will not be available Warning: MetaGeneMark is not found, some functionality will not be available Warning: SignalP+ is not found, some functionality will not be available Warning: metaphylerClassify is not found, some functionality will not be available Warning: PhyloSift was not found, will not be available

Warning: REAPR is not found, some functionality will not be available Warning: FRCbam is not found, some functionality will not be available Warning: MPI is not available, some functionality may not be available [Available RAM: 527 GB] ok [Available CPUs: 64] ok


Tasks which will be run:

Task = assemble.Assemble Task = assemble.CheckAsmResults Task = assemble.SplitMappers Task = mapreads.MapReads Task = mapreads.CheckMapResults Task = mapreads.SplitForORFs Task = findorfs.FindORFS Task = validate.Validate Task = findreps.FindRepeats Task = annotate.Annotate Task = fannotate.FunctionalAnnotation Task = scaffold.Scaffold Task = findscforfs.FindScaffoldORFS Task = abundance.Abundance Task = propagate.Propagate Task = classify.Classify Task = postprocess.Postprocess


metAMOS configuration summary: metAMOS Version: v1.5rc3 "Praline Brownie" workflows: core,imetamos Time and Date: 2016-07-12 Working directory: /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Test/test_ima Prefix: proba K-Mer: 31 Threads: 8 Taxonomic level: phylum Verbose: True Steps to skip: FindScaffoldORFS, Scaffold, Propagate, FindRepeats Steps to force: FunctionalAnnotation, Postprocess

[citation] MetAMOS Treangen, TJ ⇔ Koren, S, Sommer, DD, Liu, B, Astrovskaya, I, Ondov, B, Darling AE, Phillippy AM, Pop, M. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome biology, 14(1), R2, 2013.

iMetAMOS Koren, S, Treangen, TJ, Hill, CM, Pop, M, Phillippy, AM. Automated ensemble assembly and validation of microbial genomes. BMC Bioinformatics 15:126, 2014.

Step-specific configuration: [abundance] MetaPhyler

Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics. 2011;12 Suppl 2:S4. Epub 2011 Jul 27.

[multialign] M-GCAT /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64 Treangen TJ, Messeguer X. M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species. BMC Bioinformatics, 2006.

[fannotate] BLAST /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10.

[scaffold] Bambus 2 /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/AMOS/Linux-x86_64/bin Koren S, Treangen TJ, Pop M. Bambus 2: scaffolding metagenomes. Bioinformatics 27(21): 2964-2971 2011.

[findorfs] Prokka /usr/local/bin Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics, btu153. 2014.

[annotate] Kraken /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/kraken/bin Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.

[assemble] ABySS /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/abyss/bin Simpson, JT, Wong, K, Jackman, SD, Schein, JE, Jones, SJ, Birol, İ. ABySS: a parallel assembler for short read sequence data. Genome research, 19(6), 1117-1123, 2009.

Edena /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/edena/bin Hernandez D, Tewhey R, Veyrieras J, Farinelli L, Østerås M, François P, and Schrenzel J. De novo finished 2.8 Mbp Staphylococcus aureus genome assembly from 100 bp short and long range paired-end reads. Bioinformatics, btt590, 2013.

MetaVelvet /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/MetaVelvet Namiki, T., Hachiya, T., Tanaka, H., & Sakakibara, Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic acids research, 40(20), e155-e155, 2012.

SPAdes /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/spades/bin Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander V. Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, and Pavel A. Pevzner. Journal of Computational Biology. May 2012, 19(5): 455-477. doi:10.1089/cmb.2012.0021.

MIRA /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/mira/bin Chevreux, B, Wetter, T, Suhai, S. Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. In German Conference on Bioinformatics (pp. 45-56), 1999.

SOAPdenovo2 /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/soap2/bin Luo, R, Liu, B, Xie, Y, Li, Z, Huang, W, Yuan, J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu S, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam T, Wang J. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience, 1(1), 18, 2012.

Velvet /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/velvet Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008 May;18(5):821-9.

SOAPdenovo /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64 Li Y, Hu Y, Bolund L, Wang J: State of the art de novo assembly of human genomes from massively parallel sequencing data.Human genomics 2010, 4:271-277.

SGA /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/sga/bin Simpson, JT, Durbin, R. Efficient de novo assembly of large genomes using compressed data structures. Genome Research, 22(3), 549-556, 2012.

IDBA-UD /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/idba/bin Peng, Y., Leung, H. C., Yiu, S. M., & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28(11), 1420-1428, 2012.

MaSuRCA /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/MaSuRCA/bin Zimin, A, Marçais, G, Puiu, D, Roberts, M, Salzberg, SL, Yorke, JA. The MaSuRCA genome assembler. Bioinformatics, btt476, 2013.

Velvet-SC

Chitsaz H, Yee-Greenbaum JL, Tesler G, Lombardo MJ, Dupont CL, Badger JH, Novotny M, Rusch DB, Fraser LJ, Gormley NA, Schulz-Trieglaff O, Smith GP, Evers DJ, Pevzner PA, Lasken RL. Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nature Biotechnology, vol. 29, no. 11, pp. 915-921 (2011)

Ray

Boisvert, S, Raymond, F, Godzaridis, É, Laviolette, F, Corbeil, J. Ray Meta: scalable de novo metagenome assembly and profiling. Genome biology, 13(12), R122, 2013.

[mapreads] Bowtie /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64 Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. Epub 2009 Mar 4.

[preprocess] metAMOS built-in filtering N/A

[validate] FRCbam

Vezzi, F, Narzisi, G, Mishra, B. Reevaluating assembly evaluations with feature response curves: GAGE and assemblathons. PloS ONE, 7(12), e52210, 2013.

CGAL /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/cgal Rahman, A, Pachter, L CGAL: computing genome assembly likelihoods. Genome biology, 14(1), R8, 2013.

ALE /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/ALE/src Clark, SC, Egan, R, Frazier, PI, Wang, Z. ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics, 29(4) 435-443, 2013.

QUAST /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/quast Gurevich, A, Saveliev, V, Vyahhi, N, Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), 1072-1075, 2013.

Prokka /usr/local/bin Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics, btu153. 2014.

FreeBayes /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/freebayes/bin Garrison, E, Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907, 2012.

LAP /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/LAP Ghodsi M, Hill CM, Astrovskaya I, Lin H, Sommer DD, Koren S, Pop M. De novo likelihood-based measures for comparing genome assemblies. BMC research notes 6:334, 2013.

REAPR

Hunt, M, Kikuchi, T, Sanders, M, Newbold, C, Berriman, M, & Otto, TD. REAPR: a universal tool for genome assembly evaluation. Genome biology, 14(5), R47, 2013.

[other] Krona /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/KronaTools/bin Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385.

KmerGenie /home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/cpp/Linux-x86_64/kmergenie Chikhi, R, Medvedev, P. Informed and Automated k-Mer Size Selection for Genome Assembly. Bioinformatics btt310, 2013.

Oops, MetAMOS finished with errors! see text in red above for details. Traceback (most recent call last): File "../runPipeline", line 985, in verbose = 1) File "/home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/ruffus/task.py", line 2965, in pipeline_run raise job_errors RethrownJobError:

Exception #1
  'exceptions.ValueError(too many values to unpack)' raised in ...
   Task = def assemble.Assemble(...):
   Job  = [GCA_000010365.1_ASM1036v1_genomic.fna.run -> GCA_000010365.1_ASM1036v1_genomic.fna.asm.contig]

Traceback (most recent call last):
  File "/home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/ruffus/task.py", line 625, in run_pooled_job_without_exceptions
    return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
  File "/home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/Utilities/ruffus/task.py", line 491, in job_wrapper_io_files
    ret_val = user_defined_work_func(*param)
  File "/home/wgs/iMetAMOS2.5/metAMOS-1-2.5rc3/src/assemble.py", line 326, in Assemble
    (asmName, kmer) = asmName.split(".")
ValueError: too many values to unpack

Any advice you can provide would be much appreciated, thanks!

skoren commented 8 years ago

Yes, there was a bug handling gz input assemblies. It should be fixed if you pull 1.5rc3 from the repo again.