Open ghost opened 7 years ago
Hi @aderzelle,
The Krona BLAST classification algorithm is based on LCA (Lowest Common Ancestor), so if your reads have good hits to a diversity of taxa, that would explain the lack of resolution in the output. You can control what constitutes "good" hits with -t
, with -t 0
meaning that only exact bit score ties will defer to LCA, which may be appropriate given that 16s hits will be relatively short and similar.
thanks a lot, setting -t 0
did not help but ... the hits in the blast output all have e-values of 0.0 and at least 99% identity if this can help.
It's actually the bit scores that matter rather than e-values, at least by default in recent versions. I'm guessing there are exact ties. You can see what is tying by adding -r
to distribute randomly among the ties, but these results shouldn't be considered as real classifications.
Sorry I should have mentioned I indeed tried the -r
option, without change. The bitscores are not all strictly identical but highly similar and yes, there are exact ties. I guess then, Krona doesn't give an answer because there is no clear answer to be given from the blast results.
Thanks a lot for your prompt answer and helpful comments.
That makes sense, although it's still odd that -r
made no difference...it may be worth it to check out some of the tax IDs that are tied to see if the lowest common ancestor is really Bacteria, or else there could be a database issue. If you'd like to post a few lines of the Blast output here I can look into it.
Sure, I could even send the whole file, it's no big deal. I have randomly picked up some of the tax IDs and they are all bacteria indeed, ranging from "uncultured bacteria"
`LYSAT_2_16S KX509289.1 99.857 1399 1 1 1 1398 1447 49 0.0 2571
LYSAT_2_16S JQ977227.1 99.571 1398 6 0 1 1398 1398 1 0.0 2549
LYSAT_2_16S KR233780.1 99.500 1401 4 3 1 1398 1437 37 0.0 2545
LYSAT_2_16S KX036607.1 99.499 1398 7 0 1 1398 1405 8 0.0 2543
LYSAT_2_16S KT949415.1 99.500 1399 6 1 1 1398 1432 34 0.0 2543
LYSAT_2_16S CP014517.1 99.500 1399 6 1 1 1398 422991 424389 0.0 2543
`
Yes, and in this case, at least, the "uncultured bacteria" hit happens to be the best one, which is why changing the threshold didn't help. I'm not sure what to do about this, other than blasting to a cleaner DB, e.g. Refseq Genomic maybe. In the future, the classifier could try to ignore these taxonomic oddities. You could also try feeding your blast results into a more comprehensive classifier, like MEGAN, and then using ktImportTaxonomy
to visualize its output.
Hello all,
I have a blastn file conatining 16s blastn results, and almost half of them are classified by Kona as "unassigned" while it's clear they should be assigned. Now, that file contains several different samples, say samples from "site1" "site2", etc ... and I wanted to focus on each specific sample. So I extracted the blast output for only "site1", for example, and now, Krona only shows a full circle "100% bacteria" with no other taxonomic assignment. However the blast result have well-defined taxonomic assignments. I did the 2 updates of Accession and Taxonomu
I am not sure to understand Krona behaviour, what am I missing? Thanks a lot