lutteropp / NetRAX

Phylogenetic Network Inference without ILS
GNU General Public License v3.0
17 stars 1 forks source link

Experiment Plot planning for Tree Simulations #15

Open lutteropp opened 3 years ago

lutteropp commented 3 years ago

Now that the columns in the results CSV are fixed (see https://github.com/lutteropp/NetRAX/issues/14), let's list which plots we need. Here are some that I believe make sense to have:

Plot 1: BIC score of

Plot 2: Network loglikelihood score of

Plot 3: Normalized RF distance with true simulated tree and

Plot 4:

stamatak commented 3 years ago

sounds good

On 30.11.20 15:20, Sarah Lutteropp wrote:

Now that the columns in the results CSV are fixed (see #14 https://github.com/lutteropp/NetRAX/issues/14), let's list which plots we need. Here are some that I believe make sense to have:

  • Use the dataset IDs as x-values. For each combination of MSA-size, simulator-type, sampling-type, likelihood-type, do the following plots:

Plot 1: BIC score of

  • true simulated network
  • raxml-ng best tree
  • inferred network with NetRAX starting from raxml-ng best tree
  • inferred network with NetRAX starting from 10 random + 10 parsimony trees

Plot 2: Network loglikelihood score of

  • true simulated network
  • raxml-ng best tree
  • inferred network with NetRAX starting from raxml-ng best tree
  • inferred network with NetRAX starting from 10 random + 10 parsimony trees

Plot 3: Normalized RF distance with true simulated tree and

  • raxml-ng best tree
  • inferred network with NetRAX starting from raxml-ng best tree
  • inferred network with NetRAX starting from 10 random + 10 parsimony trees

Plot 4:

  • number of near-zero branches in best raxml-ng tree

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/15, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6TDQ3FKAXWM7RGKH5DSSOLYDANCNFSM4UHSCWZA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

lutteropp commented 3 years ago

current state in trying to plot things, turns out the BICs are too closely lying together. Instead of plotting the BIC scores, it likely makes more sense to print absolute difference to BIC score of "true" network... Screenshot from 2020-11-30 22-36-14

lutteropp commented 3 years ago

Screenshot from 2020-11-30 22-50-25 Yes... this looks slightly better, but still not useful... switching to relative BIC difference instead of absolute difference here. Also, likely a histogram works better for this kind of data.

lutteropp commented 3 years ago

Here with relative BIC differences, values smaller than zero meaning a BIC improvement Screenshot from 2020-11-30 22-57-53 Slightly more useful, but still... histogram is likely better here.

lutteropp commented 3 years ago

Maybe for the BIC score, what we really are interested in are the counts of these situations happening:

lutteropp commented 3 years ago

I got the BIC score plots to look like this now SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_bic_stats SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_bic_plot

lutteropp commented 3 years ago

For relative RF distance, I currently have such plots: SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_rfdist_stats SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_rfdist_plot

A set of histograms would definitely fit better here.

stamatak commented 3 years ago

I guess so

On 30.11.20 23:37, Sarah Lutteropp wrote:

current state in trying to plot things, turns out the BICs are too closely lying together. Instead of plotting the BIC scores, it likely makes more sense to print absolute difference to BIC score of "true" network... Screenshot from 2020-11-30 22-36-14 https://user-images.githubusercontent.com/1059869/100668712-a39e4b00-335c-11eb-87a3-0b23f4bc7178.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/15#issuecomment-736071729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6WZFB5F5ALHZGUYRT3SSQGDBANCNFSM4UHSCWZA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

stamatak commented 3 years ago

This one looks pretty good

On 30.11.20 23:59, Sarah Lutteropp wrote:

Here with relative BIC differences, values smaller than zero meaning a BIC improvement Screenshot from 2020-11-30 22-57-53 https://user-images.githubusercontent.com/1059869/100670749-9cc50780-335f-11eb-8965-08adcf5d3be1.png Slightly more useful, but still... histogram is likely better here.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/15#issuecomment-736081684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6T4Z4ZMVBZ5GMSU5RLSSQIUTANCNFSM4UHSCWZA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

stamatak commented 3 years ago

This looks really good now, it will only be a bit difficult to present in the paper as the various combinations might cause confusion, so maybe just presenting 3-4 such configurations and moving the rest into the supplement might be a good idea

On 01.12.20 00:41, Sarah Lutteropp wrote:

I got the BIC score plots to look like this now SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_bic_stats https://user-images.githubusercontent.com/1059869/100674523-90dc4400-3365-11eb-980c-6fa487be6e99.png SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_bic_plot https://user-images.githubusercontent.com/1059869/100674525-9174da80-3365-11eb-9d1a-184e22bad109.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/15#issuecomment-736099506, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6VEIX2X5AUOXCNS7TDSSQNSRANCNFSM4UHSCWZA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

stamatak commented 3 years ago

does zero refer to near-zero branch lengths?

alexis

On 01.12.20 01:19, Sarah Lutteropp wrote:

For relative RF distance, I currently have such plots: SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_rfdist_stats https://user-images.githubusercontent.com/1059869/100677201-d4857c80-336a-11eb-9b1a-1e397432a35f.png SimulationType CELINE_SamplingType PERFECT_SAMPLING_1000_msasize_LikelihoodType BEST_rfdist_plot https://user-images.githubusercontent.com/1059869/100677203-d5b6a980-336a-11eb-95bb-306259984ae4.png

A set of histograms would definitely fit better here.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lutteropp/NetRAX/issues/15#issuecomment-736113763, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXB6WW77FSCBYS4J4WHW3SSQSA5ANCNFSM4UHSCWZA.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org