nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
188 stars 119 forks source link

A feature to remove taxa found in controls #634

Open skose82 opened 1 year ago

skose82 commented 1 year ago

Description of feature

Hi there,

As per discussion with Daniel Straub, I'd like to request a contamination removal feature in which taxa found in controls are removed from the main sample set. It would be great if the feature was optional, as sometimes the water etc controls contain cross contamination from the sample set rather than the environment itself.

d4straub commented 1 year ago

Thanks! The idea here could be to add a parameter, e.g. --contamination_controls "sample1,sample2", and all sequences that appear in that control samples are removed from the ASV table (including the control samples itself). More advanced for such a task (using control samples) might be decontam which is also in bioconda.

erikrikarddaniel commented 1 year ago

I would absolutely recommend Decontam. We have seen in actual projects that raw removal of ASVs found in negative controls risks both to remove true ASVs found in samples and miss contaminants. This is, of course, taking Decontam as the truth, but the results have looked intuitively good.

There are at least two ways of running Decontam, and I think it would be wise to allow both.

d4straub commented 1 year ago

Alright, thanks, then it will be not worth the effort to implement the simple method above but rather immediately a proper one such as Decontam.

skose82 commented 1 year ago

Hi all,

I wouldn't advise decontam until everything is known about how it removes an asv - exactly. We still need a clean feature which will simply remove anything in the control samples as a first pass for comparison with a second pass without removal. This is what we did before ampliseq and what most microbiologists do with every project - scan the controls and remove what they see as a legitimate contamination. To do this removal is time intensive and tedious and then you have to replot. It would be truly worthwhile to have this feature as an option, then we can look at the output and decide if it's worth using decontam instead or not. It certainly should be an option as it currently is not an option in decontam!

d4straub commented 1 year ago

Hm thats a rather emotional plea for a simple method. I do think that the decontam documentation is not too ambiguous. Decontam implements a method that is using control samples, see here, I am not sure what your exact criticism is? Manual manipulation is however the worst way of data processing in my opinion, it would be in any case better to automatize, i.e. standardize and make reproducible. I can live with having optional filters available. If you or someone else wants to implements that simple method because you feel its a method with future, I will not stand in your way (I cannot speak for others though).