lutteropp / NetRAX

Phylogenetic Network Inference without ILS
GNU General Public License v3.0
17 stars 1 forks source link

[for later, just curious] Question about Ancestral State Reconstruction #41

Open lutteropp opened 3 years ago

lutteropp commented 3 years ago

How is the ancestral state defined in networks?

If we have the ancestral states for all displayed trees, how do we get the ancestral states for the network? Is it just a weighted average of the per-displayed-tree ancestral states?

stamatak commented 3 years ago

good question, one to maybe keep, but I guess here we would rather be interested in the anc. states of the displayed trees to better determine where we might want put a reticulation, but maybe this is the wrong way of going about it in my head.

On 04.02.21 14:51, Sarah Lutteropp wrote:

How is the ancestral state defined in networks?

If we would have the ancestral states for all displayed trees, how would we get the ancestral states for the network? Is it just a weighted average of the per-displayed-tree ancestral states?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/41, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6W4VKDJTHSAZX2SJITS5KJ43ANCNFSM4XCYTV4A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

That sounds even better. I've taken a look at the ancestral state computation for pll_utree_t, adapting this to the network displayed trees looks pretty straight-forward.

If we find a way how to choose an arc insertion move based on per-displayed-tree ancestral states, I expect the coding+testing work to be manageable (about 2 weeks?).

lutteropp commented 3 years ago

I don't see yet how the ancestral states for the displayed trees of a r-reticulation network help in finding the arc insertion move to take in order to get to (r+1) reticulations. But anyway, it is a question to explore for later.

lutteropp commented 3 years ago

Found the old notes: https://github.com/lutteropp/NetRAX/issues/30#issuecomment-741810684

The idea is to find two nodes whose ancestral states have the smallest distance, but are located in different places in the network. Makes sense...

stamatak commented 3 years ago

yes, good that you digged out the old notes, regarding single leaf descendants: you would go past the intermediate node to the next real parent node, but you have to be careful with the branch length handling there

On 04.02.21 16:15, Sarah Lutteropp wrote:

Found the old notes: https://github.com/lutteropp/NetRAX/issues#issuecomment-741810684

The idea is to find two nodes whose ancestral states have the smallest distance, but are located in different places in the network. Makes sense...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/41#issuecomment-773334451, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6UBTFN3LFEGTA2S5R3S5KTZZANCNFSM4XCYTV4A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

Sounds like we can simply reuse the fake-node-approach here. (adding the fake node as the second child with brlen zero, the fake node having all-equal ancestral state probabilities)

The fake-node-approach works well for likelihood computation already, and saved lots of ugly special case handling there.

lutteropp commented 3 years ago

This should work:

What we still need to figure out: Given two ancestral states s1 and s2 at nodes u and v, we need to define dist((u, s1), (v, s2)) as some function of

We want dist((u,s1),(v,s2)) to be large if dist(s1, s2) is small and dist(u,v) is large.

lutteropp commented 3 years ago

A possible definition for dist((u,s1),(v,s2)) would be:

(1.0 - dist(s1,s2)) * dist(u,v).

Here, I assume that dist(s1, s2) is in range [0,1]. If it is not, we can normalize it by using the minimum and maximum value encountered in the network.

lutteropp commented 3 years ago

It is still unclear to me whether dist(u,v) should be the number of edges on a edge-minimal path between u and v in the network, or whether the sum of edge weights fits better.

I tend towards using the number of edges though, because we will do branch length optimization after inserting an arc anyway...

stamatak commented 3 years ago

On 16.02.21 02:33, Sarah Lutteropp wrote:

It is still unclear to me whether dist(u,v) should be the /number of edges/ on a edge-minimal path between u and v in the network, or whether the /sum of edge weights/ fits better.

I tend towards using the number of edges though, because we will do branch length optimization after inserting an arc anyway...

I think you will need to experiment to see what works better, regarding the direct distance between the ancestral sequences you could also just optimize it via ML

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/41#issuecomment-779510538, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6RECP2YCJDJUBKX6W3S7G4OFANCNFSM4XCYTV4A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org