New experimental results

lutteropp commented 3 years ago

The completed results CSV files from the cluster experiments are finally there.

small_network: 4-10 taxa, 1-2 reticulations, msa-size either 100n_trees or 200n_trees, perfect sampling, linked branches, both likelihood variants, both start from raxml-best-tree and start from 5 random + 5 parsimony trees
medium_network_norandom: 11-30 taxa, 1-3 reticulations, msa-size either 100n_trees or 200n_trees, perfect sampling, linked branches, both likelihood variants, start from raxml-best-tree only

I uploaded the result CSVs in this folder: https://drive.google.com/drive/folders/1u2YDjea8sHcRuTanFdDyBGghlfJXmldH?usp=sharing The actual datasets, files, and logs are currently stored on the haswell cluster.

I am now working on a IPython notebook to do fancy statistics and plots for the results. It will take me a while because data analysis with Python/pandas is entirely new to me.

lutteropp commented 3 years ago

Still working on improving the evaluation, but that's what I have so far:

NetRAX Experiment Evaluation_medium_network_norandom.pdf NetRAX Experiment Evaluation_small_network.pdf

Initial observations:

starting from multiple trees is better than starting from a single tree, even with just 4-10 taxa
often, BIC refuses all reticulations and favors a tree (-> we either need less extreme reticulation probabilities in the simulated networks (maybe even 0.5 everywhere?), or huuuge MSA sizes)
I did not encounter any weird networks in these datasets (weird network meaning that the displayed trees differ in only one branch length, but otherwise have identical topology)

Based on these observations, I am adding a network simulation mode that assigns probability 0.5 to all reticulations it introduces. This will give us way better topological distance scores, since these scores ignore reticulation probabilities.

lutteropp commented 3 years ago

Oh, yet another observation: LikelihoodModel.AVERAGE and LikelihoodModel.BEST seem to perform equally well here. This is due to us having one MSA partition per displayed tree though.

lutteropp commented 3 years ago

I'm resubmitting the small_network and medium_network_norandom experiments with all reticulation probs set to 0.5 now. I am calling them small_network_uniform and medium_network_norandom_uniform. In general, I have made the experiments more flexible by replacing prob = random.random() by prob = random.uniform(settings.min_reticulation_prob, settings.max_reticulation_prob) in Celines network simulator code.

stamatak commented 3 years ago

Hi Sarah,

Thank you for these, the next simulation step sounds good, may I ask what "true network weirdness" means?

Also, how do I interpret those distance measures, what do the values on the x-axis mean, are they absolute or relative?

Thanks,

Alexis

On 08.02.21 14:21, Sarah Lutteropp wrote:

Still working on improving the evaluation, but that's what I have so far:

NetRAX Experiment Evaluation_medium_network_norandom.pdf https://github.com/lutteropp/NetRAX/files/5943612/NetRAX.Experiment.Evaluation_medium_network_norandom.pdf NetRAX Experiment Evaluation_small_network.pdf https://github.com/lutteropp/NetRAX/files/5943613/NetRAX.Experiment.Evaluation_small_network.pdf

Initial observations:

starting from multiple trees is better than starting from a single tree, even with just 4-10 taxa

often, BIC refuses all reticulations and favors a tree (-> we either need less extreme reticulation probabilities in the simulated networks (maybe even 0.5 everywhere?), or huuuge MSA sizes)

I did not encounter any weird networks in these datasets (weird network meaning that the displayed trees differ in only one branch length, but otherwise have identical topology)

Based on these observations, I am adding a network simulation mode that assigns probability 0.5 to all reticulations it introduces. This will give us way better topological distance scores, since these scores ignore reticulation probabilities.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/42#issuecomment-775107788, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6W7ZYTVTFD65IPXYS3S57JOJANCNFSM4XC2W23A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

"true network weirdness":

Following the Slack discussion, I defined a simulated network as "weird", if its displayed trees all induce the same bipartitions. I then defined the weirdness factor of a network as being (number of displayed tree pairs that have different bipartitions) divided by (total number of displayed tree pairs). This includes simulated networks where displayed trees have exactly the same topology and only differ in a single branch length.

Topological distance measures:

The distance measures are defined here: https://github.com/lutteropp/NetRAX/issues/13 . They are absolute values and depend on the placement of the network root. We used Dendroscope for computing these distances. I still need to familiarize myself more with them.

lutteropp commented 3 years ago

The completed results CSV files from the additional cluster experiments are there.

small_network_uniform: 4-10 taxa, 1-2 reticulations, msa-size either 100n_trees or 200n_trees, perfect sampling, linked branches, both likelihood variants, both start from raxml-best-tree and start from 5 random + 5 parsimony trees, all simulated reticulations have prob 0.5
medium_network_norandom_uniform: 11-30 taxa, 1-3 reticulations, msa-size either 100n_trees or 200n_trees, perfect sampling, linked branches, both likelihood variants, start from raxml-best-tree only, all simulated reticulations have prob 0.5

I uploaded the result CSVs in this folder: https://drive.google.com/drive/folders/1MZZd-9zd0wNqD5-ooF9UAvS34l2pKJbT?usp=sharing The actual datasets, files, and logs are currently stored on the haswell cluster.

Here are the plots for the new experiments: NetRAX Experiment Evaluation_medium_network_norandom_uniform.pdf NetRAX Experiment Evaluation_small_network_uniform.pdf

Initial observations:

Bad news. Even with all reticulation probabilities set to 0.5, BIC still very often (about 80% of times?) favors a tree. Looks like we need to tweak our sequence simulation settings. Either this, or the network likelihood definition is not that good.
In up to 10% of datasets, NetRAX infers a network with a worse BIC score than the true network has. This clearly indicates that the network search algorithm needs to be improved (for example by using more start networks, better choosing the arc insertions, doing randomized-order greedy multiple times instead of deterministic greedy once, sometimes accepting worse networks during the search, ...).

lutteropp commented 3 years ago

The experimental results are discouraging... I believe the main problem we have here is the sequence simulation. Is there a way to tell seq-gen to not generate any useless garbage columns (e.g., columns that consist of the same nucleotide for nearly every taxon)? So far, we generate sequences using these seq-gen parameters: -mHKY -t3.0 -f0.3,0.2,0.2,0.3

lutteropp commented 3 years ago

A pretty counter-intuitive observation: Not sure if it's statistically significant, but it looks like having a MSA size of (number of displayed trees) 100 leads to slightly worse results than having a MSA size of (number of displayed trees) 50... Could be just by chance though.

stamatak commented 3 years ago

On 09.02.21 13:01, Sarah Lutteropp wrote:

  "true network weirdness":
Following the Slack discussion, I defined a simulated network as "weird", if its displayed trees all induce the same bipartitions. I then defined the weirdness factor of a network as being (number of displayed tree pairs that have different bipartitions) divided by (total number of displayed tree pairs). This includes simulated networks where displayed trees have exactly the same topology and only differ in a single branch length.

thanks, I guessed that it was something like this.

  Topological distance measures:
The distance measures are defined here: #13 https://github.com/lutteropp/NetRAX/issues/13 . They are absolute values and depend on the placement of the network root. We used Dendroscope for computing these distances. I still need to familiarize myself more with them.

thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/42#issuecomment-775856965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6WS5V4PTXKBFMY3WF3S6EI2DANCNFSM4XC2W23A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

stamatak commented 3 years ago

Initial observations:

Bad news. Even with all reticulation probabilities set to 0.5, BIC still often favors a tree. Looks like we need to tweak our sequence simulation settings.

what happens if we make the MSAs longer?

In up to 10% of datasets, NetRAX infers a network with a worse BIC score than the true network has.

How much worse is it? Is the difference slight or large?

This clearly indicates that the

network search algorithm needs to be improved (for example by using
more start networks, better choosing the arc insertions, doing
randomized-order greedy multiple times instead of deterministic
greedy once, sometimes accepting worse networks during the search, ...).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/42#issuecomment-775946462, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6TPBOBN5LEV25WRZOTS6E34XANCNFSM4XC2W23A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

How much worse is it? Is the difference slight or large?

The relative difference is tiny. I have added relative BIC diff and relative loglh diff plots to all experiments we had so far: NetRAX Experiment Evaluation_small_network.pdf NetRAX Experiment Evaluation_medium_network_norandom.pdf NetRAX Experiment Evaluation_small_network_uniform.pdf NetRAX Experiment Evaluation_medium_network_norandom_uniform.pdf

what happens if we make the MSAs longer?

With MSA size of (number of displayed trees) 100, we have slightly worse results than with MSA size of (number of displayed trees) 50... Could be just by chance though, as these stats were not taken on a per-dataset basis.

I am trying the small_network_uniform experiment with MSA size of (number of displayed trees) 50 vs MSA size of (number of displayed trees) 400 now. The main problem here is finding the right parameters for seq-gen though. So far, we are using -mHKY -t3.0 -f0.3,0.2,0.2,0.3. I don't know how to choose good parameters for seq-gen.

lutteropp commented 3 years ago

I also really need to add evaluation of higher-reticulation networks encountered by NetRAX during the search, but discarded due to losing the BIC comparison. I just looked at some of these by hand, and they look pretty good (-> number of reticulations and reticulation probs seem reasonable) when comparing them to the "true" simulated network.

stamatak commented 3 years ago

that's good news :-)

On 10.02.21 12:50, Sarah Lutteropp wrote:

I also really need to add evaluation of higher-reticulation networks encountered by NetRAX during the search, but discarded due to losing the BIC comparison. I just looked at some of these by hand, and they look pretty good (-> number of reticulations and reticulation probs seem reasonable) when comparing them to the "true" simulated network.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/42#issuecomment-776622196, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6WWIBAODY34G4EP2L3S6JQGZANCNFSM4XC2W23A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

stamatak commented 3 years ago

On 10.02.21 12:17, Sarah Lutteropp wrote:

How much worse is it? Is the difference slight or large?
The relative difference is tiny.

Well in that case one would argue that one has not enough data to distinguish between these differences, looks like they might just be round off errors

I have added relative BIC diff and relative loglh diff plots to all experiments we had so far: NetRAX Experiment Evaluation_small_network.pdf https://github.com/lutteropp/NetRAX/files/5957362/NetRAX.Experiment.Evaluation_small_network.pdf NetRAX Experiment Evaluation_medium_network_norandom.pdf https://github.com/lutteropp/NetRAX/files/5957364/NetRAX.Experiment.Evaluation_medium_network_norandom.pdf NetRAX Experiment Evaluation_small_network_uniform.pdf https://github.com/lutteropp/NetRAX/files/5957365/NetRAX.Experiment.Evaluation_small_network_uniform.pdf NetRAX Experiment Evaluation_medium_network_norandom_uniform.pdf https://github.com/lutteropp/NetRAX/files/5957367/NetRAX.Experiment.Evaluation_medium_network_norandom_uniform.pdf
what happens if we make the MSAs longer?
With MSA size of (number of displayed trees) 200, we have slightly worse results than with MSA size of (number of displayed trees) 100... Could be just by chance though.

I am trying with MSA size of (number of displayed trees) 100 vs MSA size of (number of displayed trees) 400 now.

thanks ...

The main problem here is finding the right parameters for seq-gen though. So far, we are using |-mHKY -t3.0 -f0.3,0.2,0.2,0.3|. I don't know how to choose good parameters for seq-gen.

what exactly do you want to achieve? also probably a good idea to ask your lab mates on slack :-)

alexis

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/42#issuecomment-776602138, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6VURVG36HPM5NNXNHTS6JMKZANCNFSM4XC2W23A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

what exactly do you want to achieve?

More signal in the MSA, without increasing NetRAX runtime by a lot...

also probably a good idea to ask your lab mates on slack :-)

I did :D. We decided to increase the branch length scaling factor (-s parameter in seq-gen) from 1.0 to 4.0

stamatak commented 3 years ago

yes I saw that after posting here, also good to increase MSA length, I believe you could explicitely set it in seq-gen

On 10.02.21 13:10, Sarah Lutteropp wrote:

what exactly do you want to achieve?
More signal in the MSA, without increasing NetRAX runtime by a lot...
also probably a good idea to ask your lab mates on slack :-)
I did :D. We decided to increase the branch length scaling factor (-s parameter in seq-gen) from 1.0 to 4.0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/42#issuecomment-776634094, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6VUKJPBH5MKW2KTWQ3S6JSUFANCNFSM4XC2W23A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

I need to redo some evaluation parts - the computation of true network weirdness currently always reports 0.0 (see https://github.com/lutteropp/NetRAX/issues/44)

lutteropp commented 3 years ago

A pretty counter-intuitive observation: Not sure if it's statistically significant, but it looks like having a MSA size of (number of displayed trees) 100 leads to slightly worse results than having a MSA size of (number of displayed trees) 50... Could be just by chance though.

This observation is not true. I mixed up total MSA size and number of sites per displayed tree in my evaluations. What I observed there was that NetRAX with 50 sites per displayed tree performed worse if the true network had 2 reticulations than if the true network had 1 reticulation. Nothing surprising here.

... Looks like I need to refactor the iPython notebook for experiment evaluation a lot. And then re-evaluate all experiments we had so far.

stamatak commented 3 years ago

ah okay, that makes more sense now

On 10.02.21 21:55, Sarah Lutteropp wrote:

A pretty counter-intuitive observation: Not sure if it's
statistically significant, but it looks like having a MSA size of
(number of displayed trees) * 100 leads to slightly *worse* results
than having a MSA size of (number of displayed trees) * 50... Could
be just by chance though.
This observation is not true. I mixed up total MSA size and number of sites per displayed tree in my evaluations. What I observed there was that NetRAX with 50 sites per displayed tree performed worse if the true network had 2 reticulations than if the true network had 1 reticulation. Nothing surprising here.

... Looks like I need to refactor the iPython notebook for experiment evaluation a lot. And then re-evaluate all experiments we had so far.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/42#issuecomment-776974701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6VXZEITDB475ZUG4J3S6LQEXANCNFSM4XC2W23A.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp / NetRAX

New experimental results #42

"true network weirdness":

Topological distance measures: