merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
432 stars 145 forks source link

Newick rooting with branch support value #2224

Closed FlorianTrigodet closed 6 months ago

FlorianTrigodet commented 7 months ago

Hello everyone and welcome for our new and final episode of the newick tree branch support value problem!

Background

The problem: when rooting a tree, some branch support values have to be moved around. See the issue #2043 for more information and link to a paper explaining the situation.

The origin of the problem is that the "support values" were stored as node.name and not node.support in bottleroutes.py's reroot_tree. ete3 is perfectly capable of replacing branch support values (and that's a good news!).

The solution

You could be tempted to just copy node.name (which contains the real support value) into node.support, root the tree and maybe rename node.name with node.support after they were moved appropriately by ete3. And it would work. As long as your support values are float.

But the world is complex and Tom's support values looks like that: '100/99.6'. Basically a string. It is ok, in the code below you will see a hack™ that pair unique integer to node.name, use unique integer as support value, track their movement after rooting and rename node.name accordingly. Now, support value format is not a problem and Tom's tree can be properly rooted and rerooted, the support value will be rightfully placed.

Test case

You don't need to trust me, you just need to test with the following example which is directly copied from the paper mentioned earlier. The fist digit of the string test case matches the 1, 2, and 3 support value of the original test case:

# support value as number
echo "((C,D)1,(A,(B,X)3)2,E)R;" > tree_support_as_number.newick
anvi-interactive --manual -t tree_support_as_number.newick -p profile_support_as_number.db

# or support value as string
echo "((C,D)1/98.5,(A,(B,X)3/777)2/5546,E)R;" > tree_support_as_string.newick
anvi-interactive --manual -t tree_support_as_string.newick -p profile_support_as_string.db

Original anvi'o (WRONG) behavior after rooting using X:

Screenshot 2024-02-16 at 10 45 06

Now:

Screenshot 2024-02-16 at 10 45 32

Also works if support value is a string, but we cannot display it (more on that later), but using the panel Data, one can see that there is no problem and the string starting with "2" is in the lower branch compared to the wrong behavior:

Screenshot 2024-02-16 at 10 50 42

Improvement needed

The zeros

We can see some 0 where we don't have support value. This is a default behavior in anvi'o and as nothing to do with the rooting. To be fair, if one has support values for their tree, they have values for every branch and it shouldn't be a problem. Anyway, there should be nothing displayed.

What is node.name is REALY the node's name

Let's imagine the user has a tree with no support values, but the non-leaf node have a name. Then you don't want to move the node's name like we move the support values after rooting!!!

I propose the following:

Screenshot 2024-02-16 at 10 21 47

If you click on the first option, the string associated with each node is considered support value and will be reassigned after rooting. The second option will not move the node's name.

What if I have BOTH node name and support value

Then I don't like you and it is personal at this point. But seriously, it is not possible. These info are stored in the same location in the newick tree structure. You either treat it as node name OR support value. So we are good.

I want to display my support value when they look like Tom's (100/99.6)

Me too. I have checked the code a little bit, and I have noticed that there is a js function that check if branch support value as a min and max value. If not, you get this warning when you want to display the support values:

Screenshot 2024-02-16 at 11 10 45

The need to check for max and min are for two things: define range of value to display and display a symbol (round shape) proportional to the support value:

Screenshot 2024-02-16 at 11 12 07

Now, I would like the front end to check if branch support is number or string. If number, show the same as above. But if non digit character, display the text only (no range filtering, no display symbol):

Screenshot 2024-02-16 at 10 26 29 Screenshot 2024-02-16 at 11 15 27

Summary TODO

ivagljiva commented 7 months ago

Then I don't like you and it is personal at this point.

😂 ok then

meren commented 7 months ago

FANTASTIC. Thank you both, @FlorianTrigodet and @metehaansever, and thank you again, @FlorianTrigodet, for the extensive reporting.

First,

These info are stored in the same location in the newick tree structure. You either treat it as node name OR support value. So we are good.

Thank you, STUPID DUMB FILE FORMATS OF BIONFORMATICS. It is great.

Second, the right way to show multiple support values is the following, IMO.

If you don't find integer/float in there,

This way people with multiple support values can have their cake and eat it too.

metehaansever commented 6 months ago

image We can display 2 branch support values instead of label. This changes directly effect display value range calculation process. Due to this changes, we can't calculate percentile Symbols. I also noticed that the display value range part of the code doesn't work as it should.

For better calculation, We decided to take min and max values from all list of branch support values. For example; A Tool Result B Tool Result
100 99
89 97
88 93

Min value will be 88 and max value will be 100

metehaansever commented 6 months ago

image I have finally managed to display two different support values with proportionally calculated sizes. However, we encounter an issue where both of them share the same X and Y coordinates. This solution works fine for the Phylogram view, but when it comes to the Circlephylogram, we need to calculate the midpoint between the left and right branches and update the X and Y coordinates accordingly. For this specific task, I am eager to tap into @ahenoch 's knowledge regarding Circlephylogram.


Inverted version: image


Phylogram view image

ahenoch commented 6 months ago

Just a small idea to that problem, is it possible to get the x and y coordinates of the not circular version and then do a cirlce transformation? That way the blue and red dots should be placed on one vector starting from the middle point of the circle. image image Or you use vector mathematics on the circle phylogram if you have the red dots coordinates.

metehaansever commented 6 months ago

@ahenoch Thanks for insights! I guess, I got your point!

metehaansever commented 6 months ago

image image