nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
187 stars 117 forks source link

Add vsearch uchime_denovo for chimera removal (or flagging) #631

Open andand opened 1 year ago

andand commented 1 year ago

Description of feature

We have found in deeply sequenced (1 M reads / sample) COI datasets that a lot of chimeras remain (0 - 15% of the ASVs are chimeras) after running through DADA2, including removeBimeraDenovo() with default settings. Running uchime_denovo in vsearch seems to work well in removing remaining chimeric ASVs in our data (this allows for some mismatches between parents and children, which default removeBimeraDenovo does not) without removing "true" ASVs. Would be nice to have this as an option in nf-core/ampliseq. @johnne knows more about this.

johnne commented 1 year ago

Hi, Yes we implemented this additional chimera removal step via vsearch in this ASV-clustering workflow. See here in the Readme for an overview, and the relevant rules are in workflow/rules/chimeras.smk. Briefly, chimera detection is run either in 'batchwise' or 'samplewise' mode where the former runs the algorithm on ASVs found in all samples together, while the latter first splits the ASV input into one file per sample (based on ASV presence determined from a counts-file) then runs chimera detection on each of those file.

The output from vsearch chimera detection is then used to filter out chimeric ASVs using different parameters such as

Let me know if you want to discuss this and how to implement it in ampliseq.

d4straub commented 1 year ago

Hi! Chimera removal is important, so I think this is indeed interesting.

Running uchime_denovo in vsearch seems to work well in removing remaining chimeric ASVs in our data (this allows for some mismatches between parents and children, which default removeBimeraDenovo does not) without removing "true" ASVs.

Yes, indeed default removeBimeraDenovo does not allow mismatches between chimera and parent, but using a config one could modify this behavior in ampliseq by overwriting that line, e.g. by using -c chimera.config that contains

process {
    withName: DADA2_RMCHIMERA {
        ext.args = 'method="consensus", minSampleFraction = 0.9, ignoreNNegatives = 1, minFoldParentOverAbundance = 2, minParentAbundance = 8, allowOneOff = TRUE, minOneOffParentDistance = 4, maxShift = 16'
    }
}

Would you be able to test whether that doesnt improve the chimera removal for your case in a similar manner? (Just want to make sure existing settings are not already covering this.)