metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

16S analysis #38

Closed SilasK closed 6 years ago

SilasK commented 6 years ago

Hey @brwnj

I remember in the original figure you had planned to recover 16S sequences and to apply MerCat.

I started applying Atlas on my data. Now I have a discrepancy between 16S rDNA amplicon sequencing and 16S recovered from the metagenome (decontamination step). I find an OTU increased with amplicon sequencing which I don't find increased when mapping the metagenome metagenome reads to the representatives of my amplicon OTUs.

We think it might originate from the deduplication step. Clumpify removes exact duplicates. what do you think? Is it a good Idea to do de-duplication on the 16S reads in the metagenome? The same question for error correction?

If not it would be an additional argument for #31

brwnj commented 6 years ago

I agree that #31 solves this problem. Just so this is clear, are you proposing decon -> deduplication -> quality_trim -> merge -> error correction?

SilasK commented 6 years ago

It seems that adapter trimming would be necessary before if the user provides adapters. bbduck2 has the option to merge overlapping reads.

The official guide proposes:

these steps are best done in a specific order, which I have detailed below, along with the suggest tool. Note that many of them (like quality-trimming) are optional, so if you do them, do them in this order

0) Format conversion 1) adapter trimming 2) Contaminant filtering for synthetic molecules and spike-ins such as PhiX. Always recommended. Tool: BBDuk. 3) Quality-trimming and/or quality-filtering. Optional; only recommended if you have very low-quality data or are doing something very sensitive to low-quality data, like calling very rare variants. 4) host contaminant + 16S (bbsplit) 6) Deduplication 7) Normalization 8) Error correction. 9) Paired-read merging.

SilasK commented 6 years ago

I think it is intelligent to do more sophisticated steps like deduplication, normalization later.

step 1. ->3 are implemented in bbduck2. We would have quality controlled reads after step 4.

I'm not sure which reads we then should map on the contigs. deduplicated, without error correction?

brwnj commented 6 years ago

Silas implemented these changes and they are now merged into the master branch.