Open andand opened 1 year ago
Hi, Yes we implemented this additional chimera removal step via vsearch in this ASV-clustering workflow. See here in the Readme for an overview, and the relevant rules are in workflow/rules/chimeras.smk. Briefly, chimera detection is run either in 'batchwise' or 'samplewise' mode where the former runs the algorithm on ASVs found in all samples together, while the latter first splits the ASV input into one file per sample (based on ASV presence determined from a counts-file) then runs chimera detection on each of those file.
The output from vsearch chimera detection is then used to filter out chimeric ASVs using different parameters such as
Let me know if you want to discuss this and how to implement it in ampliseq.
Hi! Chimera removal is important, so I think this is indeed interesting.
Running uchime_denovo in vsearch seems to work well in removing remaining chimeric ASVs in our data (this allows for some mismatches between parents and children, which default removeBimeraDenovo does not) without removing "true" ASVs.
Yes, indeed default removeBimeraDenovo does not allow mismatches between chimera and parent, but using a config one could modify this behavior in ampliseq by overwriting that line, e.g. by using -c chimera.config
that contains
process {
withName: DADA2_RMCHIMERA {
ext.args = 'method="consensus", minSampleFraction = 0.9, ignoreNNegatives = 1, minFoldParentOverAbundance = 2, minParentAbundance = 8, allowOneOff = TRUE, minOneOffParentDistance = 4, maxShift = 16'
}
}
Would you be able to test whether that doesnt improve the chimera removal for your case in a similar manner? (Just want to make sure existing settings are not already covering this.)
Description of feature
We have found in deeply sequenced (1 M reads / sample) COI datasets that a lot of chimeras remain (0 - 15% of the ASVs are chimeras) after running through DADA2, including removeBimeraDenovo() with default settings. Running uchime_denovo in vsearch seems to work well in removing remaining chimeric ASVs in our data (this allows for some mismatches between parents and children, which default removeBimeraDenovo does not) without removing "true" ASVs. Would be nice to have this as an option in nf-core/ampliseq. @johnne knows more about this.