merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
440 stars 145 forks source link

[BUG] Support values can be misplaced in the interactive interface #2043

Closed tdelmont closed 7 months ago

tdelmont commented 1 year ago

Short description of the problem

Support values in the interactive interface can be wrong, due to a well known problem.

anvi'o version

anvi'o v7.1

System info

Mac

Detailed description of the issue

We realised that bootstrap values can be misplaced in the interactive interface. This is especially true when playing around with re-rooting. This is a well know phenomenon, described in this paper: https://academic.oup.com/mbe/article/34/6/1535/3077051

Files to reproduce

Here is are files that can be used for testing: Archive.zip

meren commented 1 year ago

You are the best, Tom. Thank you for reproducible examples.

meren commented 1 year ago

I hope you would be available for questions later, too.

metehaansever commented 1 year ago

I can't see all the things in the Branch Support section. I try with that piece of command:

anvi-interactive --manual -p PROFILE.db -t DNApolB_NCLDV_marine_db_v1.Darius_EukaOnly_exceptAlpha.wMedusa.noLB5.Duplodna.Baculo.nr95.corrected.removedSeqTreemmed_Tom.noLoneSeq2.fftnsi.trim50.fa.treefile -d metadata_duplodna.txt

Screenshot from 2023-02-09 11-50-49

tdelmont commented 1 year ago

You should be able to see the support values using the mouse (right side of the interface) and pointing it to the branche splits within the tree.

The problem is that the scores provided by anvi'o do not necessarily match to those in the tree itself (named DNApolB_NCLDV_marine_db_v1.Darius_EukaOnly_exceptAlpha.wMedusa.noLB5.Duplodna.Baculo.nr95.corrected.removedSeqTreemmed_Tom.noLoneSeq2.fftnsi.trim50.fa.treefile)

tdelmont commented 1 year ago

I just understood. In our case, we have 2 support values and because of that none are displayed, which is for another request. But again, what we have done is explore the values using the mouse in the tree, interactively.

Sorry, since all our work is done using 2 support values, I do not have a better example for you.

metehaansever commented 1 year ago

You should be able to see the support values using the mouse (right side of the interface) and pointing it to the branche splits within the tree.

The problem is that the scores provided by anvi'o do not necessarily match to those in the tree itself (named DNApolB_NCLDV_marine_db_v1.Darius_EukaOnly_exceptAlpha.wMedusa.noLB5.Duplodna.Baculo.nr95.corrected.removedSeqTreemmed_Tom.noLoneSeq2.fftnsi.trim50.fa.treefile)

I guess there is something else because all the support values looks 0. Screenshot from 2023-02-09 13-56-25 Maybe we can turn this bug into a feature. We can show 2 support values on the Mouse Layer. What you think about that? @meren

meren commented 1 year ago

I see. There are two problems here:

I think we need a very simple phylogentic tree with made-up values and branch names that would help us solve the second first, and then we can implement something to show multiple branch support values properly :)

tdelmont commented 1 year ago

Correct, this bug report is for the second issue, which is more problematic and very likely also occurs for people having one single support value. Again, I do not have trees with single values as of now....

Tom

FlorianTrigodet commented 1 year ago

Very interesting paper! And probably so many mislabelled trees out there. The main problem is the newick tree format doesn't have a way store branch support values. You can only annotate nodes, and implicitly it is assumed that the value is for the branch above, toward the root. But when you reroot, some branches are now upside down, and the node annotation is used to describe a wrong branch.

I made a small test case based on that paper (it is a newick tree, but I called it .txt because github was not happy). tree.txt

anvi-interactive --manual -t tree.txt -p PROFILE.db

Here is what we see without rooting. Everything is fine: Screenshot 2023-02-09 at 4 15 39 PM

Now reroot from X and it looks like this now. On the A panel you see anvi'o output, and on the B panel you see what it should be. Funny the "0", it correspond to the initial "root" from the figure above".

Screenshot 2023-02-09 at 4 23 18 PM

An internal node (not root, not tip) always have three branches right? To know for which branch your support value is for, you should look for the branch leading to the original root of your newick file. Without re-rooting, it implicitly means the branch above.

FlorianTrigodet commented 1 year ago

Another issue arise. What if your node's comment are not branch value, but it means something else, actually for the node?

Then we need a check box in the additional setting asking the user if the node's annotation should be treated as such, or as branch value (and be visually moved when re-rooting).

From the author of that paper:

we suggest that all tree viewers and toolkits shall offer an explicit option to choose between the two possible interpretations of node labels. Ideally, users should be forced to define the semantics of their node labels before the tree is displayed or rerooted by the respective tool. This way, accidentally wrong interpretations are avoided and unaware users will become aware of the semantics of inner node labels.

meren commented 1 year ago

You are the best, Florian! Thank you for providing an example here and helping Mete with this :)