marbl / Krona

Interactively explore metagenomes and more from a web browser.
https://github.com/marbl/Krona/wiki
466 stars 102 forks source link

Krona chart only shows 100% bacteria #52

Open ghost opened 7 years ago

ghost commented 7 years ago

Hello all,

I have a blastn file conatining 16s blastn results, and almost half of them are classified by Kona as "unassigned" while it's clear they should be assigned. Now, that file contains several different samples, say samples from "site1" "site2", etc ... and I wanted to focus on each specific sample. So I extracted the blast output for only "site1", for example, and now, Krona only shows a full circle "100% bacteria" with no other taxonomic assignment. However the blast result have well-defined taxonomic assignments. I did the 2 updates of Accession and Taxonomu

I am not sure to understand Krona behaviour, what am I missing? Thanks a lot

ondovb commented 7 years ago

Hi @aderzelle, The Krona BLAST classification algorithm is based on LCA (Lowest Common Ancestor), so if your reads have good hits to a diversity of taxa, that would explain the lack of resolution in the output. You can control what constitutes "good" hits with -t, with -t 0 meaning that only exact bit score ties will defer to LCA, which may be appropriate given that 16s hits will be relatively short and similar.

ghost commented 7 years ago

thanks a lot, setting -t 0 did not help but ... the hits in the blast output all have e-values of 0.0 and at least 99% identity if this can help.

ondovb commented 7 years ago

It's actually the bit scores that matter rather than e-values, at least by default in recent versions. I'm guessing there are exact ties. You can see what is tying by adding -r to distribute randomly among the ties, but these results shouldn't be considered as real classifications.

ghost commented 7 years ago

Sorry I should have mentioned I indeed tried the -r option, without change. The bitscores are not all strictly identical but highly similar and yes, there are exact ties. I guess then, Krona doesn't give an answer because there is no clear answer to be given from the blast results.

Thanks a lot for your prompt answer and helpful comments.

ondovb commented 7 years ago

That makes sense, although it's still odd that -r made no difference...it may be worth it to check out some of the tax IDs that are tied to see if the lowest common ancestor is really Bacteria, or else there could be a database issue. If you'd like to post a few lines of the Blast output here I can look into it.

ghost commented 7 years ago

Sure, I could even send the whole file, it's no big deal. I have randomly picked up some of the tax IDs and they are all bacteria indeed, ranging from "uncultured bacteria"

`LYSAT_2_16S KX509289.1 99.857 1399 1 1 1 1398 1447 49 0.0 2571

LYSAT_2_16S JQ977227.1 99.571 1398 6 0 1 1398 1398 1 0.0 2549

LYSAT_2_16S KR233780.1 99.500 1401 4 3 1 1398 1437 37 0.0 2545

LYSAT_2_16S KX036607.1 99.499 1398 7 0 1 1398 1405 8 0.0 2543

LYSAT_2_16S KT949415.1 99.500 1399 6 1 1 1398 1432 34 0.0 2543

LYSAT_2_16S CP014517.1 99.500 1399 6 1 1 1398 422991 424389 0.0 2543

`

ondovb commented 7 years ago

Yes, and in this case, at least, the "uncultured bacteria" hit happens to be the best one, which is why changing the threshold didn't help. I'm not sure what to do about this, other than blasting to a cleaner DB, e.g. Refseq Genomic maybe. In the future, the classifier could try to ignore these taxonomic oddities. You could also try feeding your blast results into a more comprehensive classifier, like MEGAN, and then using ktImportTaxonomy to visualize its output.