lutteropp / NetRAX

Phylogenetic Network Inference without ILS
GNU General Public License v3.0
17 stars 0 forks source link

Can the current horizontal moves fix a badly placed reticulation? #34

Open lutteropp opened 3 years ago

lutteropp commented 3 years ago

With the current horizontal rNNI and rSPR moves, can we place a reticulation somewhere else in a single step?

I think no... It looks to me that we can only change one of the parents of a reticulation, but not both at the same time... And this is likely the cause for the current search in waves returning sometimes bad results, depending on where the new reticulation has initially been placed.

We either need an implementation that directly places a new reticulation to a reasonable location (using the ancestral states idea from https://github.com/lutteropp/NetRAX/issues/30#issuecomment-741810684), or a new horizontal move capable of moving a reticulation entirely (this can be done by introducing a move that does an ArcRemoval followed by an ArcInsertion).

celinescornavacca commented 3 years ago

Sure, you can have an ArcRemoval followed by an ArcInsertion to move the reticulation far away.

I am just afraid that the space of solutions of both moves will be too big to search. I think that that option 2 (to choose the first place here the reticulation is fixed better) would be a better idea... Is it now randomly placed?

lutteropp commented 3 years ago

Yes, it is now randomly placed. There are not many arc removal moves to consider. After all, we can remove at most O(number of reticulations) arcs, but the number of possible arc insertion moves is indeed super large (O(n^3)) and would be problematic. We could use the more restrictive subset of DeltaPlus moves instead.

I also prefer directly finding the best place where to put a reticulation to. But at least with the method discussed in the whiteboard meeting, this means implementing ancestral state reconstruction for networks first.

celinescornavacca commented 3 years ago

Yes, it is now randomly placed. There are not many arc removal moves to consider. After all, we can remove at most O(number of reticulations) arcs, but the number of possible arc insertion moves is indeed super large (O(n^3)) and would be problematic. We could use the more restrictive subset of DeltaPlus moves instead.

I am a bit lost here. Why should it be faster to add a randomly placed reticulation, delete it, try another randomly placed reticulation etc ... instead of using arc insertion (why O(n^3), should it be O(n^2)?) and head/tail moves (neighbourhood O(n)) to move the reticulation and arc removal to remove it if it is really not needed?

celinescornavacca commented 3 years ago

Also, I would like to add some other possibilities to the ancestral state reconstruction idea (so this is for later, but I do not want to forget so I write it here) to avoid to have random reticulations:

1) we could get a best ML tree per partition 2) we could take two of these ML trees and use algorithms such as the hybridisation network on the two ML trees to get a network to use as starting network 3) we can do this again for another pair of ML trees and so on to have different starting networks

This approach will permit to have starting reticulations that explain the difference between two ML trees. We will probably think of a way to combine reticulations due to different pairs of ML trees, or choose the pair of trees with higher distance since we should have in our pair set the two trees where the switching choices are opposite (ie one tree take all the left edges and the other all the right ones).

These methods are already implemented in Dendroscope, ultra-net and other tools, I guess it would be easier to include them in our tool instead of coding the method again.

lutteropp commented 3 years ago

@celinescornavacca This sounds like a great approach! :) But why stop at a single pair of ml trees then? Can't we do the entire network this way? And then have this like some kind of "maximum parsimony" starting network?

With using Dendroscope, there is a little problem: Calling Dendroscope requires xvfb-run, and I am struggling to get this to work on the cluster. I would prefer to use some open source implementation instead...

stamatak commented 3 years ago

On 30.01.21 16:02, Sarah Lutteropp wrote:

@celinescornavacca https://github.com/celinescornavacca This sounds like a great approach! :)

Yes, indeed.

But why stop at a single pair of ml trees then? Can't we do the entire network this way? And then have this like some kind of "maximum parsimony" starting network?

Well yes, in the end the question is which heuristic works better, in standard phylogenetics a combination of both approaches might work best.

With using Dendroscope, there is a little problem: Calling Dendroscope requires xvfb-run, and I am struggling to get this to work on the cluster. I would prefer to use some open source implementation instead...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/34#issuecomment-770217094, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6UWX6BK5MAM4UHBBQLS4QGP5ANCNFSM4U7ZY5SA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

@celinescornavacca While I do like the approach you proposed, I do not see how it helps with the "which arc insertion should we take when moving from k reticulations to k+1?" in the wave-search network search algorithm (which is https://github.com/lutteropp/NetRAX/issues/30#issuecomment-747951635).

Combining partition-wise best ml-trees sounds great for coming up with a set of start networks.

But since the "best k-reticulation network" found during the wave-search algorithm might be arbitrary, I do not see how we can use per-partition-ml-trees to decide where to place the next reticulation. It could be that trying to combine the current "best k-reticulation network" with such a per-partition-ML-tree would require more than a single extra reticulation, as their topology might diverge more.

The ancestral states approach on the other hand (https://github.com/lutteropp/NetRAX/issues/30#issuecomment-741810684) leads to a straight-forward approach to where to place the next reticulation.

Most likely, NetRAX should support both approaches (the ML-tree-per-partition one to come up with reasonable start networks + the ancestral state one to reduce the search space when moving from k to k+1 reticulations).

... Or we have to change the network search algorithm.

celinescornavacca commented 3 years ago

Hi,

It was indeed an idea to start the search from networks with some reticulations that make sense such that moving the reticulation would work more easily. I was not considering the "from k to k+1" step. Maybe we should move the discussion in another issue.

About the starting networks, I have a lot of ideas "parsimony" for the next article (combing trees, combining clusters, maximum parsimony ...) so I agree with you, Sarah :)