lutteropp / NetRAX

Phylogenetic Network Inference without ILS
GNU General Public License v3.0
17 stars 1 forks source link

Unrooted softwired distance #45

Open lutteropp opened 3 years ago

lutteropp commented 3 years ago

@celinescornavacca I've read the part in your book about hardwired/softwired clusters in rooted phylogenetic networks. Now I understand what these are:

You mentioned that for softwired network distance, we can modify its definition by treating the displayed trees as unrooted ones. My question is here: How is a cluster defined in an unrooted tree, where the edges have no direction? Which side of the bipartition induced by the edge is the cluster then?

celinescornavacca commented 3 years ago

We need to change things a bit if we want to use unrooted trees. You need to use bipartitions (and not clusters) in the unrooted setting for softwired distances (as I wrote in the slack): For a given network N, T(N) are the displayed trees of N and B(N) are the set of all bipartitions of the trees in T(N). Then, the softwired unrotoed distance between two networks N1 and N2 would be (|B(N1)\B(N2)| + |B(N2)\B(N1)|)/(|B(N2)+B(N1)|)

lutteropp commented 3 years ago

Oh cool, so it really is just the bipartitions then. Nice, that's simple to implement! :) (I've already worked with bipartitions in trees for transfer bootstrap scores, likely can reuse much of the code from there)

Thanks for the lightspeed answer. :-)

lutteropp commented 3 years ago

@celinescornavacca I have implemented this: Screenshot from 2021-02-15 23-34-36

Is this (especially the denominator) how you meant it?

lutteropp commented 3 years ago

Now that I think about it... Isn't our "unrooted softwired distance" the same as the normal rooted softwired distance definition, only that we are scaling the distance to be in the range [0.0, 1.0]?

Because when computing a bipartition induced by an edge, I simply collect the set of taxa which are descendants of the edge's target node. All these taxa are on the left side of the bipartition. The remaining taxa are on the right side.

And when I compare two bipartitions, I say bipartitions A and B are equal, if (A.left == B.left) and (A.right == B.right), or (A.left == B.right) and (A.right == B.left)

... To me it looks like we can easily scale all reported topological network distances to be in range [0.0,1.0]...

lutteropp commented 3 years ago

Do we need to ignore the trivial bipartitions (with only a single taxon on one side)? Or are they counted as well?

lutteropp commented 3 years ago

Now that I think about it... Isn't our "unrooted softwired distance" the same as the normal rooted softwired distance definition, only that we are scaling the distance to be in the range [0.0, 1.0]?

Ok nevermind regarding this, I now understand the difference between rooted and unrooted softwired distance. Let S be the set of all taxa. Let the cluster A be a subset of S.

lutteropp commented 3 years ago

Is this (especially the denominator) how you meant it?

Nevermind, I just noticed my mistake: The denominator has to be the cardinality of the set union, not the sum of the set cardinalities...

stamatak commented 3 years ago

we shouldn't count them as they do not provide any information about the tree/network structure

On 16.02.21 01:13, Sarah Lutteropp wrote:

Do we need to ignore the trivial bipartitions (with only a single taxon on one side)? Or are they counted as well?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/45#issuecomment-779487984, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6VLNXFUMTO72NLNDETS7GTCNANCNFSM4XUZONAQ.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

celinescornavacca commented 3 years ago

Hi

I cannot access github from my phone so i cannot see what I wrote on the denominator, (maybe I wrote this too fast, in this case, sorry for the mess) but the denominator is there to ensure that the distance.max = 1 if no bipartition is in common between the trees.

On February 16, 2021 7:10:27 AM GMT+01:00, Alexis Stamatakis notifications@github.com wrote:

we shouldn't count them as they do not provide any information about the > tree/network structure>

On 16.02.21 01:13, Sarah Lutteropp wrote:> Do we need to ignore the trivial bipartitions (with only a single taxon > on one side)? Or are they counted as well?>

—> You are receiving this because you are subscribed to this thread.> Reply to this email directly, view it on GitHub >

https://github.com/lutteropp/NetRAX/issues/45#issuecomment-779487984,

or unsubscribe >

https://github.com/notifications/unsubscribe-auth/AAGXB6VLNXFUMTO72NLNDETS7GTCNANCNFSM4XUZONAQ.>

-- > Alexandros (Alexis) Stamatakis>

Research Group Leader, Heidelberg Institute for Theoretical Studies> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology>

www.exelixis-lab.org>

-- > You are receiving this because you were mentioned.> Reply to this email directly or view it on GitHub:> https://github.com/lutteropp/NetRAX/issues/45#issuecomment-779607689

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.