Closed ashleyp1 closed 3 weeks ago
Hi, thank you for reporting this issue!
This does not look right.
Although taxonomic ranks with low-confidence, e.g. with values below 0.8, should not be trusted, the classifications should not jump between different clades in the tree as you go down to the species level.
I'll look deeper into the issue as soon as possible.
Could you please send me the exact command you ran?
Would it be possible to send me (a subset of) the queries and the database used? Or is it confidential?
Here is the command I used. I sent you an invite to a dropbox folder with my database and the sample I first found the issue in. Thanks for looking into this!
vsearch --sintax \
1-filt-trimmed-HL068_FW.fastq.gz \
--db sintax_db.fasta \
--tabbedout 1-68_sintax.tsv \
--sintax_cutoff 0.7 --strand both -notrunclabels
Thank you, I'll look into it. Got the data.
There was a logical bug in the selection of the best lineages. It should be fixed now in commit aa94d1c. I think it should only appear when the confidence is below 0.5, so it shouldn't matter much in most cases, although it was confusing.
I will make a new release soon with this fix.
Sorry for the bug and thank you very much for reporting this issue!
BTW, I'll recommend using the --sintax_random
option to avoid length bias in the taxonomic classification.
The fixes are available now in release 2.29.0:
I encountered some confusing results while testing sintax on my data. I'm running v 2.28.1 on near full length 16S amplicons against a custom database. For some of my samples (mostly ones without high confidence values) I get mixed taxonomies that seem to jump around, like below.
The first two show the lineage that I would expect for Exiguobacterium, but how did it go from Listeria to Exiguo and Exiguo to Salmonella on the next two?
I thought it was an error in my database at first, but I checked and confirmed that the lineages are all correct and formatted properly. At this point, I assume this is most likely a fault in my understanding of how sintax works and I know that the bootstrap values for those two are low enough I probably won't use them, but I'd still like to understand how this is happening.
Thanks!