lutteropp / NetRAX

Phylogenetic Network Inference without ILS
GNU General Public License v3.0
17 stars 1 forks source link

Observation: The number of reticulations depends on the chosen brlen linkage mode (and maybe also on the likelihood model) #33

Open lutteropp opened 3 years ago

lutteropp commented 3 years ago

This example network shows that the displayed trees of a network can have identical topology: 3

In here, only a single branch length can differ between the two displayed trees.

In unlinked brlens mode, each partition has its own branch lengths. Thus for the unlinked brlens mode it makes no sense to have a reticulation here. The reticulation makes sense in linked branch lengths mode (but there, mostly for LikelihoodModel.BEST, as the advantage of the variable branch length has some negative influences in LikelihoodModel.AVERAGE). For scaled branch lengths mode, it is still a bit unclear for me what should ideally happen in the inference.

lutteropp commented 3 years ago

In LikelihoodModel.AVERAGE, the negative influence comes from the reticulation probabilities being always linked among all partitions. We would not have a negative influence here if each partition would have its own reticulation probabiities.

celinescornavacca commented 3 years ago

Today I notice that I understand well the difference between LikelihoodModel.BEST and LikelihoodModel.AVERAGE but I am not sure I understand UNLINKED, LINKED and SCALED.

When we have a tree and and we have LINKED, any gene of a partition has the same tree and the same branch lengths. When we have a tree and and we have SCALED, any gene of a partition has the same tree and the same branch lengths, ignoring a multiplication factor (the mutation rates vary among gene). When we have a tree and and we have UNLINKED, any gene of a partition has the same tree but the branch lengths can vary (mutation rates vary among genes and branches).

@stamatak, am I right? Sorry to ask stupid things, but better safe than sorry.

In my simulation we are in the LINKED case for now (we we could easily have SCALED for the next article), so I do not see why we should have UNLINKED in the reconstruction.

Also, it would be good to add a step to the simulations, if you both agree: since branch lengths and inheritance probabilities are bothering us, how about running netRax on the true topology to see how good we are at estimating branch lengths and inheritance probabilities under the 6 combinations of BEST and AVERAGE x UNLINKED, LINKED and SCALED?

stamatak commented 3 years ago

That's all correct regarding the branch length flavors.

My rationale is that the br-lens have a substantial impact on the likelihood score and that different displayed trees as defined by a reticulation will also have pretty different optimal branch lengths, hence my preference for unlinked.

Regarding the additional experiments, that's a very good idea, I totally agree.

Alexis

On 18.12.20 20:24, celinescornavacca wrote:

Today I notice that I understand well the difference between LikelihoodModel.BEST and LikelihoodModel.AVERAGE but I am not sure I understand UNLINKED, LINKED and SCALED.

When we have a tree and and we have LINKED, any gene of a partition has the same tree and the same branch lengths. When we have a tree and and we have SCALED, any gene of a partition has the same tree and the same branch lengths, ignoring a multiplication factor (the mutation rates vary among gene). When we have a tree and and we have UNLINKED, any gene of a partition has the same tree but the branch lengths can vary (mutation rates vary among genes and branches).

@stamatak https://github.com/stamatak, am I right? Sorry to ask stupid things, but better safe than sorry.

In my simulation we are in the LINKED case for now (we we could easily have SCALED for the next article), so I do not see why we should have UNLINKED in the reconstruction.

Also, it would be good to add a step to the simulations, if you both agree: since branch lengths and inheritance probabilities are bothering us, how about running netRax on the true topology to see how good we are at estimating branch lengths and inheritance probabilities under the 6 combinations of BEST and AVERAGE x UNLINKED, LINKED and SCALED?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/33#issuecomment-748244608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6QRRDC5J7VH7XWPI73SVOM65ANCNFSM4U7ZELGA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

celinescornavacca commented 3 years ago

Here UNLINKED is applied to networks, not displayed trees right? If so, my rationale is that having the same branch lengths for the same network topology for all partitions will give us more power to infer reticulations

stamatak commented 3 years ago

I am not so sure, I believe that as we are doing computations via the displayed trees, and different tree topologies need different sets of branch lengths to have an optimal likelihood, branch lengths might need to be estimated separately.

However, this really depends on how the networks are simulated, that is for which parts of the displayed trees we simulated along branches of the same length or where we do not.

So I'd say the key task here is to make sure that we simulate and infer under the same model for the branch lengths.

On 18.12.20 23:48, celinescornavacca wrote:

Here UNLINKED is applied to networks, not displayed trees right? If so, my rationale is that having the same branch lengths for the same network topology for all partitions will give us more power to infer reticulations

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/33#issuecomment-748335804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6SPXENJZ6TZGHH25D3SVPE3ZANCNFSM4U7ZELGA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

Cool! Then we can do all experiments in LINKED mode! :-) This one also performed best in the anecdotal experience I had so far. I believe (from reading the simulator code) the network simulations are using LINKED branch lengths as well.

celinescornavacca commented 3 years ago

The simulator simulates an ultrametric networks and extracts displayed trees out of the simulated network, one for partition. Then simulates sequences on the output displayed trees.

To be sure to get it right, let' s look at this pics: photo_2021-01-19_10-00-19

Suppose that we simulate 2 partitions, one along T1 and one along T2. The LINKED mode says that the brl of T1 and those of T2 have to be the same in the first and the second partition (so brl T1 in P1=brl T1 in P2 AND brl T2 in P1=brl T2 in P2), right? [note that it is totally OK to have T1 and T2 to have the same topology, but they have to be treated as different trees]