nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
184 stars 115 forks source link

Cut ASVs for taxonomy assignment #225

Open erikrikarddaniel opened 3 years ago

erikrikarddaniel commented 3 years ago

In some cases, a user might have sequenced an amplicon that is longer than the sequences in the database one wants to use. For this to work, ASV sequences could be cut before taxonomy assignment.

d4straub commented 3 years ago

Interesting, do you have an example when that would be the case? To illustrate the problem. With what parameters would you cut the ASV sequences? Degenerated nucleotide sequence? Nucleotide positions?

erikrikarddaniel commented 3 years ago

People are apparently sequencing whole rRNA operons, but most databases are limited to a single gene, or ITS, per sequence. To assign taxonomy, one would hence have to cut down the ASV to what's in a particular database. The alternative would be to trust that the kmer distribution is the same, but I don't think this would be good.

jtangrot commented 3 years ago

We have sequenced more or less the whole rRNA operon in fungi, but as (most of) UNITE only contain the ITS region we need to cut the resulting ASVs and use only the ITS (or even ITS2) region for the taxonomy assignment. For this we use ITSx (https://microbiology.se/software/itsx/), which can be used both for fungi and other phyla. Would it be an option to include this as an optional step, e.g. with a parameter --cut_its?

erikrikarddaniel commented 3 years ago

I suppose we were thinking of something general, and this sounds specific to ITS. OTOH, better to have something that works for the only use case I'm aware of than nothing, so, in my opinion, go ahead and add.

d4straub commented 3 years ago

So is this solved?