marbl / metAMOS

A metagenomic and isolate assembly and analysis pipeline built with AMOS
http://marbl.github.io/metAMOS
Other
93 stars 45 forks source link

REAPR #254

Open Dangoh opened 7 years ago

Dangoh commented 7 years ago

Hi,

Been running the pipeline and get the following error for FRCBAM and REAPR, however I'm mostly interested in REAPR and can't quite find out the problem. It is installed in Utilities/cpp/Linux-x86_64/REAPR. Any ideas?

Job = [preprocess.success -> .run] completed Completed Task = assemble.SplitAssemblers Uptodate Task = assemble.Assemble Uptodate Task = assemble.CheckAsmResults Job = [[idba-ud.127.asm.contig, masurca.127.asm.contig, spades.127.asm.contig, velvet.127.asm.contig] -> .asm.contig] completed Completed Task = assemble.SplitMappers Uptodate Task = mapreads.MapReads Uptodate Task = mapreads.CheckMapResults Job = [[idba-ud.127.contig.cvg, masurca.127.contig.cvg, spades.127.contig.cvg, velvet.127.contig.cvg] -> *.contig.cvg] completed Completed Task = mapreads.SplitForORFs Uptodate Task = findorfs.FindORFS Starting Task = validate.VALIDATE metAMOS: Error, type FRCBAM is not available, skipping metAMOS: Error, type REAPR is not available, skipping Warning: selected score FRCBAM was not available, skipping it Warning: selected score REAPR was not available, skipping it recruiting genomes.. done! recruited 2 genomes! Starting Task = findorfs.FINDORFS

Job = [[idba-ud.127.faa, masurca.127.faa, spades.127.faa, velvet.127.faa] -> [validate.ok]] completed

Completed Task = validate.Validate Starting Task = findrepeats.FINDREPEATS Job = [proba.fna -> proba.repeats] completed Completed Task = findreps.FindRepeats Starting Task = annotate.ANNOTATE Job = [proba.faa -> proba.hits] completed Completed Task = annotate.Annotate Starting Task = functionalannotation.FUNCTIONALANNOTATION Job = [proba.faa -> [blast.out, krona.ec.input]] completed Completed Task = fannotate.FunctionalAnnotation Starting Task = scaffold.SCAFFOLD Job = [[proba.asm.contig] -> scaffold.ok] completed Completed Task = scaffold.Scaffold

Starting Task = findscaffoldorfs.FINDSCAFFOLDORFS Job = [proba.linearize.scaffolds.final -> proba.fna] completed Completed Task = findscforfs.FindScaffoldORFS Starting Task = abundance.ABUNDANCE Job = [proba.asm.contig -> proba.taxprof.pct.txt] completed Completed Task = abundance.Abundance Starting Task = propagate.PROPAGATE Job = [proba.annots -> propagate.ok] completed Completed Task = propagate.Propagate Starting Task = classify.CLASSIFY Job = [proba.clusters -> classify.ok] completed Completed Task = classify.Classify Starting Task = postprocess.POSTPROCESS Job = [proba.asm.contig -> proba.scf.fa] completed Completed Task = postprocess.Postprocess done! pipeline took 83.08 minutes

skoren commented 7 years ago

metAMOS cannot find the REAPR binaries, there should be a report at the beginning of the metAMOS run which will tell you where it is pulling all the programs from. It looks for an executable named reapr in your path. Do you have REAPR installed on your system or were you relying on metAMOS to install it? If you install REAPR and link it from Utilities/cpp/Linux-amd64/REAPR and make sure you have a reapr file in that subfolder it should work (see issue #249)

Dangoh commented 7 years ago

Hi Skoren,

Thank you for the quick reply!

Well it actually finds reapr and what I don't have and don't need is FRC, newbler and ca, see below. Also, the executable is in the folder under Utilities/cpp/Linux-amd64/REAPR named reapr and I've also put it in my PATH.

Starting Task = runpipeline.RUNPIPELINE Starting metAMOS pipeline Warning: Celera Assembler is not found, some functionality will not be available Warning: Newbler is not found, some functionality will not be available Warning: SignalP+ is not found, some functionality will not be available Warning: FRCbam is not found, some functionality will not be available [Available RAM: 255 GB] ok [Available CPUs: 40] ok


Tasks which will be run:

Task = preprocess.Preprocess Task = assemble.SplitAssemblers Task = assemble.Assemble Task = assemble.CheckAsmResults Task = assemble.SplitMappers Task = mapreads.MapReads Task = mapreads.CheckMapResults Task = mapreads.SplitForORFs Task = findorfs.FindORFS Task = validate.Validate Task = findreps.FindRepeats Task = annotate.Annotate Task = fannotate.FunctionalAnnotation Task = scaffold.Scaffold Task = findscforfs.FindScaffoldORFS Task = abundance.Abundance Task = propagate.Propagate Task = classify.Classify Task = postprocess.Postprocess


metAMOS configuration summary: metAMOS Version: v1.5rc3 "Praline Brownie" workflows: core,imetamos,optional,deprecated Time and Date: 2016-11-29 Working directory: /work/daniel/Vietnam/11-37 Prefix: proba K-Mer: 31 Threads: 39 Taxonomic level: species Verbose: False Steps to skip: FindScaffoldORFS, Scaffold, Propagate Steps to force: FindRepeats, FunctionalAnnotation, Propagate

[citation] MetAMOS Treangen, TJ \u21d4 Koren, S, Sommer, DD, Liu, B, Astrovskaya, I, Ondov, B, Darling AE, Phillippy AM, Pop, M. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome biology, 14(1), R2, 2013.

iMetAMOS Koren, S, Treangen, TJ, Hill, CM, Pop, M, Phillippy, AM. Automated ensemble assembly and validation of microbial genomes. BMC Bioinformatics 15:126, 2014.

Step-specific configuration: [abundance] MetaPhyler /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64 Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics. 2011;12 Suppl 2:S4. Epub 2011 Jul 27.

[multialign] M-GCAT /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64 Treangen TJ, Messeguer X. M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species. BMC Bioinformatics, 2006.

[fannotate] BLAST /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10.

[scaffold] Bambus 2 /home/daniel/Downloads/metAMOS-1.5rc3/AMOS/Linux-x86_64/bin Koren S, Treangen TJ, Pop M. Bambus 2: scaffolding metagenomes. Bioinformatics 27(21): 2964-2971 2011.

[findorfs] Prokka /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/prokka/bin Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics, btu153. 2014.

[annotate] Kraken /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/kraken/bin Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.

[assemble] ABySS /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/abyss/bin Simpson, JT, Wong, K, Jackman, SD, Schein, JE, Jones, SJ, Birol, \u0130. ABySS: a parallel assembler for short read sequence data. Genome research, 19(6), 1117-1123, 2009.

SPAdes /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/spades/bin Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander V. Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, and Pavel A. Pevzner. Journal of Computational Biology. May 2012, 19(5): 455-477. doi:10.1089/cmb.2012.0021.

SOAPdenovo2 /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/soap2/bin Luo, R, Liu, B, Xie, Y, Li, Z, Huang, W, Yuan, J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu S, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam T, Wang J. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience, 1(1), 18, 2012.

Velvet /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/velvet Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008 May;18(5):821-9.

SGA

Simpson, JT, Durbin, R. Efficient de novo assembly of large genomes using compressed data structures. Genome Research, 22(3), 549-556, 2012.

IDBA-UD /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/idba/bin Peng, Y., Leung, H. C., Yiu, S. M., & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28(11), 1420-1428, 2012.

MaSuRCA /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/MaSuRCA/bin Zimin, A, Marçais, G, Puiu, D, Roberts, M, Salzberg, SL, Yorke, JA. The MaSuRCA genome assembler. Bioinformatics, btt476, 2013.

[mapreads] Bowtie /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64 Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. Epub 2009 Mar 4.

[preprocess] metAMOS built-in filtering N/A

[validate] FRCbam

Vezzi, F, Narzisi, G, Mishra, B. Reevaluating assembly evaluations with feature response curves: GAGE and assemblathons. PloS ONE, 7(12), e52210, 2013.

CGAL /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/cgal Rahman, A, Pachter, L CGAL: computing genome assembly likelihoods. Genome biology, 14(1), R8, 2013.

ALE /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/ALE/src Clark, SC, Egan, R, Frazier, PI, Wang, Z. ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics, 29(4) 435-443, 2013.

QUAST /home/daniel/Downloads/metAMOS-1.5rc3/quast Gurevich, A, Saveliev, V, Vyahhi, N, Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), 1072-1075, 2013.

Prokka /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/prokka/bin Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics, btu153. 2014.

FreeBayes /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/freebayes/bin Garrison, E, Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907, 2012.

LAP /home/daniel/Downloads/metAMOS-1.5rc3/LAP Ghodsi M, Hill CM, Astrovskaya I, Lin H, Sommer DD, Koren S, Pop M. De novo likelihood-based measures for comparing genome assemblies. BMC research notes 6:334, 2013.

REAPR /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/REAPR Hunt, M, Kikuchi, T, Sanders, M, Newbold, C, Berriman, M, & Otto, TD. REAPR: a universal tool for genome assembly evaluation. Genome biology, 14(5), R47, 2013.

[other] Krona /home/daniel/Downloads/metAMOS-1.5rc3/KronaTools/bin Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385.

KmerGenie /home/daniel/Downloads/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/kmergenie Chikhi, R, Medvedev, P. Informed and Automated k-Mer Size Selection for Genome Assembly. Bioinformatics btt310, 2013.

sh: 1: Syntax error: Bad fd number Starting Task = preprocess.PREPROCESS Job = [[11-37_1.fastq, 11-37_2.fastq] -> preprocess.success] completed Completed Task = preprocess.Preprocess Starting Task = assemble.ASSEMBLE metAMOS: Selected kmer size 127 metAMOS: Estimated genome size 2165047 bp Job = [preprocess.success -> *.run] completed Completed Task = assemble.SplitAssemblers Job = [abyss.127.run -> abyss.127.asm.contig] completed Job = [idba-ud.127.run -> idba-ud.127.asm.contig] completed Job = [masurca.127.run -> masurca.127.asm.contig] completed Error: requested to run SGA (sga) but not available in specified location . Please check your specification and try again Job = [sga.127.run -> sga.127.asm.contig] completed Job = [soapdenovo2.127.run -> soapdenovo2.127.asm.contig] completed Job = [spades.127.run -> spades.127.asm.contig] completed Job = [velvet.127.run -> velvet.127.asm.contig] completed Completed Task = assemble.Assemble MetAMOS Warning: abyss assembler did not run successfully! MetAMOS Warning: sga assembler did not run successfully! MetAMOS Warning: soapdenovo2 assembler did not run successfully! MetAMOS Warning: masurca assembler did not run successfully! Job = [[abyss.127.asm.contig, idba-ud.127.asm.contig, masurca.127.asm.contig, sga.127.asm.contig, soapdenovo2.127.asm.contig, spades.127.asm.contig, velvet.127.asm.contig] -> [assemble.ok]] completed Completed Task = assemble.CheckAsmResults Job = [[idba-ud.127.asm.contig, spades.127.asm.contig, velvet.127.asm.contig] -> .asm.contig] completed Completed Task = assemble.SplitMappers Starting Task = mapreads.MAPREADS Job = [idba-ud.127.asm.contig -> idba-ud.127.contig.cvg] completed Job = [spades.127.asm.contig -> spades.127.contig.cvg] completed Job = [velvet.127.asm.contig -> velvet.127.contig.cvg] completed Completed Task = mapreads.MapReads Job = [[idba-ud.127.contig.cvg, spades.127.contig.cvg, velvet.127.contig.cvg] -> [mapreads.ok]] completed Completed Task = mapreads.CheckMapResults Job = [[idba-ud.127.contig.cvg, spades.127.contig.cvg, velvet.127.contig.cvg] -> .contig.cvg] completed Completed Task = mapreads.SplitForORFs Starting Task = findorfs.FINDORFS

Job = [idba-ud.127.contig.cvg -> idba-ud.127.faa] completed

Job = [spades.127.contig.cvg -> spades.127.faa] completed

Job = [velvet.127.contig.cvg -> velvet.127.faa] completed

Completed Task = findorfs.FindORFS Starting Task = validate.VALIDATE metAMOS: Error, type FRCBAM is not available, skipping metAMOS: Error, type REAPR is not available, skipping Warning: selected score FRCBAM was not available, skipping it Warning: selected score REAPR was not available, skipping it recruiting genomes.. done! recruited 16 genomes!

Job = [[idba-ud.127.faa, spades.127.faa, velvet.127.faa] -> [validate.ok]] completed

Completed Task = validate.Validate Starting Task = findrepeats.FINDREPEATS Job = [proba.fna -> proba.repeats] completed Completed Task = findreps.FindRepeats Starting Task = annotate.ANNOTATE Job = [proba.faa -> proba.hits] completed Completed Task = annotate.Annotate Starting Task = functionalannotation.FUNCTIONALANNOTATION Job = [proba.faa -> [blast.out, krona.ec.input]] completed Completed Task = fannotate.FunctionalAnnotation Starting Task = scaffold.SCAFFOLD Job = [[proba.asm.contig] -> scaffold.ok] completed Completed Task = scaffold.Scaffold

Starting Task = findscaffoldorfs.FINDSCAFFOLDORFS Job = [proba.linearize.scaffolds.final -> proba.fna] completed Completed Task = findscforfs.FindScaffoldORFS Starting Task = abundance.ABUNDANCE Job = [proba.asm.contig -> proba.taxprof.pct.txt] completed Completed Task = abundance.Abundance Starting Task = propagate.PROPAGATE Job = [proba.annots -> propagate.ok] completed Completed Task = propagate.Propagate Starting Task = classify.CLASSIFY Job = [proba.clusters -> classify.ok] completed Completed Task = classify.Classify Starting Task = postprocess.POSTPROCESS Job = [proba.asm.contig -> proba.scf.fa] completed Completed Task = postprocess.Postprocess done! pipeline took 160.69 minutes

skoren commented 7 years ago

In that case REAPR is returning an error on this sample. You should be able to get more info on the error from the logs/VALIDATE.log file.

Dangoh commented 7 years ago

Hi Sergey,

Thank you for the reply!

The only error I seem to find is the following, after each assembly:

[REAPR preprocess] Error in system call: R CMD BATCH 00.Sample/gc_vs_cov.R 00.Sample/gc_vs_cov.Rout

I've attached the file, I would really appreciate it if you had some time to look at it.

Kind regards, Daniel

2016-12-07 18:38 GMT+01:00 Sergey Koren notifications@github.com:

In that case REAPR is returning an error on this sample. You should be able to get more info on the error from the logs/VALIDATE.log file.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marbl/metAMOS/issues/254#issuecomment-265516134, or mute the thread https://github.com/notifications/unsubscribe-auth/AXIlt3VROI91Oe4f4YZT5hnRpUDs3je_ks5rFu8WgaJpZM4K_gWc .

skoren commented 7 years ago

Your attachment didn't come though (you have to drag and drop it on the comments on the github page not attach it to email).

The error indicates REAPR failed to run due to an R script error. You can try running the R command by hand or look at the Rout file for the error but this isn't a metAMOS error, it is a REAPR error so there isn't much metAMOS can do to fix it. In this case, it is correctly ignoring the failed result and reporting that REAPR did not run.

Dangoh commented 7 years ago

Sorry about that, so here's the log-file. I'll probably run the R command if you can't find anything obvious.

VALIDATE.zip

skoren commented 7 years ago

So I think there are two issues in the reapr log, first the the failing R command: R CMD BATCH 00.Sample/gc_vs_cov.R 00.Sample/gc_vs_cov.Rout

Second, there is a failure earlier on in the pipeline auto-setting the read size:

Reading all chromosomes from /work/daniel/Vietnam/11-40/Validate/out/velvet.127.reapr.fa ... read 242 chromosomes.
Indexing 242 chromosomes ... sorting ... done.
Setting read length to 301
Using binmask 01000000
Reading solexa pair data from /work/daniel/Vietnam/11-40/Validate/out/velvet.127.reapr.perfect.tmp.reads_1.fq ... terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr

Based on looking at the metAMOS version of REAPR code, it auto-sets the read length based on the first few reads in a fastq file. If you have varying read lengths after that which are larger it may be causing the issue. You could modify the REAPR pipeline to set a larger read length or try updating the REAPR version in metAMOS by manually installing a newer one.

Either way this seems like a REAPR bug and not an issue in metAMOS.

fanavarro commented 4 years ago

Hi, we experimented the same issue with REAPR. After inspecting the file called "gc_vs_cov.Rout", we figured out that we were missing an R library (KernSmooth). It worked after we installed it. We hope it helps.

Greetings.