Closed SilasK closed 6 years ago
I agree that #31 solves this problem. Just so this is clear, are you proposing decon -> deduplication -> quality_trim -> merge -> error correction?
It seems that adapter trimming would be necessary before if the user provides adapters. bbduck2 has the option to merge overlapping reads.
The official guide proposes:
these steps are best done in a specific order, which I have detailed below, along with the suggest tool. Note that many of them (like quality-trimming) are optional, so if you do them, do them in this order
0) Format conversion 1) adapter trimming 2) Contaminant filtering for synthetic molecules and spike-ins such as PhiX. Always recommended. Tool: BBDuk. 3) Quality-trimming and/or quality-filtering. Optional; only recommended if you have very low-quality data or are doing something very sensitive to low-quality data, like calling very rare variants. 4) host contaminant + 16S (bbsplit) 6) Deduplication 7) Normalization 8) Error correction. 9) Paired-read merging.
I think it is intelligent to do more sophisticated steps like deduplication, normalization later.
step 1. ->3 are implemented in bbduck2. We would have quality controlled reads after step 4.
I'm not sure which reads we then should map on the contigs. deduplicated, without error correction?
Silas implemented these changes and they are now merged into the master branch.
Hey @brwnj
I remember in the original figure you had planned to recover 16S sequences and to apply MerCat.
I started applying Atlas on my data. Now I have a discrepancy between 16S rDNA amplicon sequencing and 16S recovered from the metagenome (decontamination step). I find an OTU increased with amplicon sequencing which I don't find increased when mapping the metagenome metagenome reads to the representatives of my amplicon OTUs.
We think it might originate from the deduplication step.
Clumpify
removes exact duplicates. what do you think? Is it a good Idea to do de-duplication on the 16S reads in the metagenome? The same question for error correction?If not it would be an additional argument for #31