Different databases used in metAMOS

nalandaatmi commented 9 years ago

I could see uniprot protein database has been used in your pipeline. 1) Does uniprot protein database used only for functional annotation purposes or for any other steps in the pipeline? 2) What are all the databases used for taxonomy, annotate (nucleotide(RefSeq) or protein (Uniprot DB) or KEGG or COG or KOG)? 3) Can we select refseq protein to build a database? 4) In addition to that html output, do we get any OTU tables for the taxonomy?

nalandaatmi commented 9 years ago

Hi Treangen/Sergey,

Can I get some updates?

treangen commented 9 years ago

Hello,

1) Does uniprot protein database used only for functional annotation purposes or for any other steps >the pipeline?

yes, it is only used for functional annotation

2) What are all the databases used for taxonomy, annotate (nucleotide(RefSeq) or protein (Uniprot >DB) or KEGG or COG or KOG)?

For this step, most tools use RefSeq microbial

3) Can we select refseq protein to build a database?

sorry, I do not fully understand the question. can you please clarify?

4) In addition to that html output, do we get any OTU tables for the taxonomy?

no, OTU tables are not provided as output

nalandaatmi commented 9 years ago

-Can we use protein sequences to a build a reference database instead of nucleotide sequences? Will the pipeline support that option?

So all these classifiers FCP, BLAST, PHMMER,PHYMM,PhyloSIFT, MetaPhyler,Kraken uses Refseq microbial.

In your manuscript, I saw interesting section from figure8 , figure 9 for "Comparative analysis of multiple samples" between male and female. I have 50 samples with paired-end reads. -Do I run the pipeline one by one for each sample or Can I run multiple samples together? -What are the options to perform comparative analysis between different samples.

treangen commented 9 years ago

-Can we use protein sequences to a build a reference database instead of nucleotide sequences? the pipeline support that option?

This entirely depends on the tool you select for the annotate step. Whichever DB is supported/provided by the tool, that is the one that is used. For further clarification, I'd encourage you to contact the tool developer to inquire as to the suitability of replacing the DB the classifier uses by default.

So all these classifiers FCP, BLAST, PHMMER,PHYMM,PhyloSIFT, MetaPhyler,Kraken uses Refseq >microbial.

Not exactly. The DBs range from RefSeq microbial genomes (FCP, Kraken), to RefSeq protein (BLAST), to marker genes (Phylosift, MetaPhyler). To clarify this, I will create a table in the documentation specifically listing the different databases used by each tool in the annotate step.

-Do I run the pipeline one by one for each sample or Can I run multiple samples together?

You will need to run the samples individually if you'd like to compare the abundance profiles of multiple samples.

nalandaatmi commented 9 years ago

Thanks Treangen for your helpful answers. It really helped me.

nalandaatmi commented 9 years ago

Hi Treangen/Sergey,

I need to use non-redundant nucleotide (nt) database for my metagenomic analysis especially for taxonomy classification and annotation. How do I incorporate that into your pipeline?

skoren commented 9 years ago

The default classifier, Kraken's database includes complete RefSeq genomes for the bacterial, archaeal, and viral domains as well as H. sapiens. You can see this list in the Kraken manual: https://ccb.jhu.edu/software/kraken/MANUAL.html#standard-kraken-database

If you would like to include other sequences, you will need to build a Kraken database to include other genomes yourself. You can follow the Kraken manual: https://ccb.jhu.edu/software/kraken/MANUAL.html#custom-databases

and then place it in /home/prabhakaranra/metAMOS-1.5rc3/Utilities/DB/kraken/

The annotation for metagenomes is done with MetaGeneMark or FragGeneScan which do de-novo gene prediction. The FunctionalAnnotate step's database cannot currently be modified.

marbl / metAMOS

Different databases used in metAMOS #203