Open erikrikarddaniel opened 3 years ago
Interesting, do you have an example when that would be the case? To illustrate the problem. With what parameters would you cut the ASV sequences? Degenerated nucleotide sequence? Nucleotide positions?
People are apparently sequencing whole rRNA operons, but most databases are limited to a single gene, or ITS, per sequence. To assign taxonomy, one would hence have to cut down the ASV to what's in a particular database. The alternative would be to trust that the kmer distribution is the same, but I don't think this would be good.
We have sequenced more or less the whole rRNA operon in fungi, but as (most of) UNITE only contain the ITS region we need to cut the resulting ASVs and use only the ITS (or even ITS2) region for the taxonomy assignment. For this we use ITSx (https://microbiology.se/software/itsx/), which can be used both for fungi and other phyla. Would it be an option to include this as an optional step, e.g. with a parameter --cut_its?
I suppose we were thinking of something general, and this sounds specific to ITS. OTOH, better to have something that works for the only use case I'm aware of than nothing, so, in my opinion, go ahead and add.
So is this solved?
In some cases, a user might have sequenced an amplicon that is longer than the sequences in the database one wants to use. For this to work, ASV sequences could be cut before taxonomy assignment.