Query Annotation based on mini kraken and custom kraken Database

marbl / metAMOS

A metagenomic and isolate assembly and analysis pipeline built with AMOS

http://marbl.github.io/metAMOS

Other

93 stars 45 forks source link

Query Annotation based on mini kraken and custom kraken Database #220

Closed nalandaatmi closed 8 years ago

nalandaatmi commented 9 years ago

Dear Sergey/Treangen,

Query regarding Annotation: My metagenomics forward and reverse fastq files have 20 million reads. After removing plant similar reads from my input fastq files using (fastq_screen pipeline), I had 4 million reads. Then I provided this fastq file (4 million reads) as input to metAMOS pipeline. FCP option has annotated those reads but each of the custom kraken database and minikraken did not annotate as expected. Can you comment on this issue?

I tried four different databases with metAMOS pipeline. 1) Using minikraken database (DB size 4.5GB), for these 4 million reads I received an output with no hits in annotation.

2) Using custom kraken database (Bacterial, Viral, Archaeal, Fungal) (DB size 105GB), for the same input fastq file as above. custom krakendb bacteria archaea viral and fungal

3) Using custom kraken database (nt database from ncbi) (DB size - 604GB), for the same input fastq file as above. custom kraken nt database

4) Using FCP database, for the same input fastq file as above. annotation based on fcp database

nalandaatmi commented 9 years ago

Dear Sergey/Treangen,

Have you got a chance to look in to this issue?

skoren commented 9 years ago

If you're not already, I'd suggest using the -u (annotate unassembled reads) option for runPipeline. This will significantly slow down FCP but Kraken should be OK. By default, only contigs are annotated but in your case more than 50% of the reads cannot be mapped to the assembled contigs (1.8M raw reads in the plots vs 3.5M input).

Kraken has a post-filtering step which only keeps sequences it could assign with sufficient confidence. These could be filtering your hits. You can turn off this filtering by editing the Utilities/config/kraken.spec and setting the 0.05 filter to 0.00. However, this will increase the false-positive rate of the classifications. FCP does not have a similar filter, I believe which is why it provides more classifications. If you have more specific questions about the classifiers, I'd suggest contacting the developers for more info.

kchambers58178 commented 8 years ago

I had similar results as above. I tried using the -u option and my results did not change. I am using metamos on a local supercomputer and am unable to change any files dealing with metamos. Is there any other suggestions for this problem?

skoren commented 8 years ago

Unfortunately, no you have to edit the Utilities/config/kraken.spec file to change the confidence setting for Kraken or use a custom database.

If you have other Kraken-specific questions, I'd suggest checking the Kraken support group.