lutteropp / NetRAX

Phylogenetic Network Inference without ILS
GNU General Public License v3.0
17 stars 1 forks source link

BUG: NetRAX treats partitioned MSA as non-partitioned (wrong command line call?) #23

Closed lutteropp closed 3 years ago

lutteropp commented 3 years ago

I forgot to pass the partitions file to NetRAX in the experiments pipeline, thus it parsed the MSA as if it only had a single partition.

lutteropp commented 3 years ago

Even worse, turns out NetRAX doesn't even have a command line parameter for passing the partitions assignment yet. Adding it now.

celinescornavacca commented 3 years ago

It is a good idea to do so.

lutteropp commented 3 years ago

Needs a bit longer, turns out the entire logic for parsing a partitioned MSA in NetRAX was missing. But I am positive that I'll get this done and debugged within the next hours.

lutteropp commented 3 years ago

(currently getting a segfault in pllmod_treeinfo_scale_branches_all)

lutteropp commented 3 years ago

The problem goes even deeper, turns out NetRAX was not using and thus not allocating this subnodes array... I need to implement another custom networks solution for the branch length scaling (which is not that hard, fortunately - just posting here for myself to remember what to do)

stamatak commented 3 years ago

best to ask Alexey ;-) and yes, we do need this functionality

On 04.12.20 12:16, Sarah Lutteropp wrote:

(currently getting a segfault in pllmod_treeinfo_scale_branches_all)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/23#issuecomment-738699005, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6VEPZYZTYDMBHQ5NTTSTCZHBANCNFSM4UMBCJ2A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

I already figured it out, I just need to inject my own scale_branches_all function call, where I'll be directly scaling the network branches. That's the easiest solution, similar to how I did it with replacing other, tree-specific things

stamatak commented 3 years ago

Okay, but make sure to chat with Alexey, branch lengths scaling can be tricky ...

On 04.12.20 12:48, Sarah Lutteropp wrote:

I already figured it out, I just need to inject my own scale_branches_all function call, where I'll be directly scaling the network branches. That's the easiest solution, similar to how I did it with replacing other, tree-specific things

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/23#issuecomment-738714041, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6UQOIBQDIVFTRNGKA3STC5ALANCNFSM4UMBCJ2A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

I won't need to change the branch length scaling algorithm. I just have to change the parts of the code that use the subnodes array to directly apply the branch length scalers to the network branches instead :)

lutteropp commented 3 years ago

(because I cannot directly use that subnodes array... the network has another data structure than the pll_utree)

stamatak commented 3 years ago

even better

On 04.12.20 13:00, Sarah Lutteropp wrote:

I won't need to change the branch length scaling algorithm. I just have to change the parts of the code that use the subnodes array to directly apply the branch length scalers to the network branches instead :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/23#issuecomment-738719034, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6VJE76OU4F6XAYCNATSTC6LXANCNFSM4UMBCJ2A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

brrr, that rabbit hole goes deeper than I first saw - will likely take me the entire day, this subnodes array is also an integral part in the model optimization (I wonder why this never segfaulted before while there was only 1 partition)

lutteropp commented 3 years ago

Maybe it's faster to build a fake_subnodes_array and then re-synchronize it with the network data structure from outside EDIT: Nope, that would take even longer and likely introduce more bugs.

lutteropp commented 3 years ago

I digged a bit deeper, okay it's not too bad... I found out that the subnodes array is used in:

Functions I don't care about:

I only have to care about making changes in here:

I don't get what was the design decision in pll-modules to have so much redundancy in pllmod_treeinfo_t... there is already that branch_lengths array, why also these subnodes? Maybe it has to do with some of the functions irrelevant to NetRAX...

stamatak commented 3 years ago

I guess that is all required for traversing and handling unrooted trees.

Alexis

On 04.12.20 13:35, Sarah Lutteropp wrote:

I digged a bit deeper, okay it's not too bad... I found that subnodes array is used in:

  • treeinfo_init_tree: not relevant for us, as I already replaced it by another function
  • pllmod_treeinfo_create: not relevant for us, as I already replaced it by another function
  • pllmod_treeinfo_set_topology: not relevant, NetRAX does not use this function, that functon is just for some SPR round on a tree
  • pllmod_treeinfo_get_topology: not relevant, NetRAX does not use this function, that functon is just for some SPR round on a tree
  • pllmod_treeinfo_compute_ancestral: not relevant, NetRAX does not provide ancrestral state computation yet

And I only have to care about making changes in here:

  • pllmod_treeinfo_scale_branches_all: This would work if I set treeinfo->subnode_count to zero
  • pllmod_algo_opt_rates_weights_treeinfo: This would work if I set treeinfo->subnode_count to zero. But then I need to make sure that old brlens are still restored another way!
  • pllmod_algo_opt_brlen_scalers_treeinfo: This would work if I set treeinfo->subnode_count to zero. But then I need to make sure that old brlens are still restored another way!
  • fix_brlen_minmax: This function needs to be totally replaced by one that does the same for the branches directly

I don't get what was the design decision back then to have so much redundancy in pllmod_treeinfo_t... there is already that branch_lengths array, why also these subnodes?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/23#issuecomment-738735781, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6TBHUVZU7X2TFLSRXLSTDCQFANCNFSM4UMBCJ2A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

Phew... that took long to figure out. But I fixed it. It caused my pll-modules fork to diverge even more from the vanilla pll-modules though. The way I fixed it was:

I consulted with Alexey, this should work for the networks.

Supporting multiple partitions in NetRAX led to another bug though, which I am dealing with now: https://github.com/lutteropp/NetRAX/issues/25