richelbilderbeek / pirouette_article

Article about pirouette, by Bilderbeek, Laudanno and Etienne
GNU General Public License v3.0
0 stars 0 forks source link

Interpret and improve subsection 'The effect of differently common diversity-dependent trees' #68

Closed richelbilderbeek closed 4 years ago

richelbilderbeek commented 4 years ago

Context, copied from response letter:

    1. My final point that would require some more work is in fact noted by the authors themselves, namely that “one tree is not enough to determine the impact of a tree prior on Bayesian inference”. Indeed, I would like to see not only the error distribution better characterized in terms of their relative magnitudes (as mentioned in my second point above), but also in terms of distributions of simulated trees. It is fine to have simple worked examples around a single tree - as they provide a clear axis along which to explain the workings of a program - but in a real-world scenario, one would probably have a distribution of trees in hand. What if the worked example in this paper revolves around a tree that is at the “tail” of the tree distribution produced by the new tree model? Perhaps if another tree was picked, then the twin and true error distributions could look more alike? Or be further apart?

We agree on this point and we have added a worked example in the manuscript. In response to other comments we have used a diversity-dependent process to generate the true tree. Because the likelihood of these trees are known, we show the pirouette results for diversity-dependent trees with low, medium and high likelihood (section 'The effect of differently common diversity-dependent trees'). [RSE: Provide the frequency distribution of the likelihoods of all the trees that you have generated from which you picked the three trees. You can indicate these trees in the plot.]

Giappo commented 4 years ago

What do you mean with "differently common diversity-dependent trees"?

richelbilderbeek commented 4 years ago

Of a different likelihood.

richelbilderbeek commented 4 years ago

Please rephrase my clumsy wording :+1:

Giappo commented 4 years ago

What do you mean with trees with different likelihoods? The likelihood, by definition, is a function that assigns a probability to the chosen parameter values given the data. How do you use it here to tell apart trees?

richelbilderbeek commented 4 years ago

I predict taking a look at the code will clarify things. Please improve :+1:

Giappo commented 4 years ago

What code?

richelbilderbeek commented 4 years ago

The section refers to a URL. The code is in example_x.R.

Giappo commented 4 years ago

The relative example is number 23. https://github.com/richelbilderbeek/pirouette_example_23

Giappo commented 4 years ago

I am running the code and I obtain this error:

alignment_params <- create_alignment_params(
    sim_tral_fun = get_sim_tral_with_std_nsm_fun(
      mutation_rate = 0.1,
      site_model = beautier::create_jc69_site_model()
    ),
    root_sequence = create_blocked_dna(length = 1000),
    rng_seed = rng_seed,
    fasta_filename = "true_alignment.fas"
  )

Error in create_alignment_params(sim_tral_fun = get_sim_tral_with_std_nsm_fun(mutation_rate = 0.1,  : 
  unused argument (sim_tral_fun = get_sim_tral_with_std_nsm_fun(mutation_rate = 0.1, site_model = beautier::create_jc69_site_model()))

I have no idea of what is "sim_tral_fun = get_sim_tral_with_std_nsm_fun"

Giappo commented 4 years ago

Additionally experiment <- create_gen_experiment() Errore: 'create_nested_sampling_mcmc' is not an exported object from 'namespace:beautier'

richelbilderbeek commented 4 years ago

I have no idea of what is "sim_tral_fun = get_sim_tral_with_std_nsm_fun"

Well, it runs on Travis, so somehow it must work. Probably need to update packages :+1:

richelbilderbeek commented 4 years ago

create_nested_sampling_mcmc

Is now beautier::create_ns_mcmc

richelbilderbeek commented 4 years ago

@Giappo: of the pirouette repo, you did not merge develop to your branch :+1:

richelbilderbeek commented 4 years ago

Context, copied from response letter:

    1. My final point that would require some more work is in fact noted by the authors themselves, namely that “one tree is not enough to determine the impact of a tree prior on Bayesian inference”. Indeed, I would like to see not only the error distribution better characterized in terms of their relative magnitudes (as mentioned in my second point above), but also in terms of distributions of simulated trees. It is fine to have simple worked examples around a single tree - as they provide a clear axis along which to explain the workings of a program - but in a real-world scenario, one would probably have a distribution of trees in hand. What if the worked example in this paper revolves around a tree that is at the “tail” of the tree distribution produced by the new tree model? Perhaps if another tree was picked, then the twin and true error distributions could look more alike? Or be further apart?

We agree on this point and we have added a worked example in the manuscript. In response to other comments we have used a diversity-dependent process to generate the true tree. Because the likelihood of these trees are known, we show the pirouette results for diversity-dependent trees with low, medium and high likelihood (section 'The effect of differently common diversity-dependent trees'). [RSE: Provide the frequency distribution of the likelihoods of all the trees that you have generated from which you picked the three trees. You can indicate these trees in the plot.]

Giappo commented 4 years ago

I suggest for this example to use a different statistics. Our goal is to show the difference between DD and BD. We know that DD naturally produces imbalanced trees as speciations are likely to slow down as the carrying capacity is approached. Therefore our goal is to set the parameters in order to highlight this. I suggest the following (we might decide to change after a test, if we're not satisfied): lambda = 1.6, mu = 0.2, K = 20. Pirouette is already provided with the gamma statistics, which is made to highlight exactly this. I suggest to use it for this example.

richelbilderbeek commented 4 years ago

Solid reasoning! I agree :+1:!

richelbilderbeek commented 4 years ago

From @rsetienne:

My response how a diversity-dependent example would resolve this: I find the current choice (in the response letter) of three trees at different points in the distribution a bit awkward. I feel that a real example will make the tail-argument irrelevant, as we will be looking at the whole distribution by looking at several trees for a few different parameter settings. I suggest looking into the scenarios of the ProcB paper (short, intermediate and long crown age, small to large extinction rate) or a selection thereof (only mu = 0 and mu = 0.4 for example) and then simulating 10-100 trees depending on how much time it takes. You could even use the trees that were simulated in this paper (which we have stored somewhere...)

richelbilderbeek commented 4 years ago

Thanks @Giappo, will take over from here :+1:

richelbilderbeek commented 4 years ago

Another suggestion from @rsetienne:

I suggest to do simulations with DDD_sim with crown age 5 and crown age 15, and extinction rates 0 and 0.4. Speciation rate is 0.8 and K is 40. And then 10-100 trees for each parameter setting.

richelbilderbeek commented 4 years ago

OK, will follow the last suggestion of @rsetienne...

richelbilderbeek commented 4 years ago

Re: pirouette things

3 February 2020 13:27 13 KB From: Richel Bilderbeek To: Giovanni Laudanno Cc: Rampal S. Etienne

I suggest to do simulations with DDD_sim with crown age 5 and crown age 15, and extinction rates 0 and 0.4. Speciation rate is 0.8 and K is 40. And then 10-100 trees for each parameter setting.

At the moment the setting is lamda = 0.8, mu = 0.1, K = 40 and we let the crown age vary with 5, 10 and 15. I thought of doing so because we are more interested in showing the inference errors vs the strength of the DD signal, which decreases with crown age (as shown is ProcB 2012). @Rampal: If you think what you just proposed fits better we can change. Just let us know.

I just rewrote the code regarding this Issue ( https://github.com/richelbilderbeek/pirouette_article/issues/68 (private, so login to GitHub first)) to follow Rampal's last suggestion to be sure Rampal agrees. I think Gio's work was great, but I want to end the confusion and will just follow the higher ranked person's suggestion.

@Giappo: I predict you agree to this decision making strategy. If not, let me know.

richelbilderbeek commented 4 years ago

Uhm, to quote the reviewer:

What if the worked example in this paper revolves around a tree that is at the “tail” of the tree distribution produced by the new tree model?

I'll stick to one crown age. I'll pick 10, as it is the median value of the sets suggested by both @Giappo and @rsetienne.

richelbilderbeek commented 4 years ago

I'll stick to one extinction rate. I'll pick 0.4, as it is the non-zero value suggested by @rsetienne.

rsetienne commented 4 years ago

OK, I am fine with crown age = 10, and mu = 0.1, and then, say, 100 trees (but I am open to other numbers if computational demands are too great). We don't need to do a thorough analysis of the diversity-dependent case, but only show how such a single parameter setting should be used. In the discussion we can then say that we advocate that the user does a more elaborate analysis of the parameter domain.

richelbilderbeek commented 4 years ago

Crown age of 10 and an extinction rate of 0.1 it is :+1:

richelbilderbeek commented 4 years ago

I quote the reviewer:

Perhaps if another tree was picked, then the twin and true error distributions could look more alike?

The Giappo code had no ordering in it whatsoever, it just simulated a bunch of trees. I will use his DDD setup and suggestion to use the gamma statistic.

To quote Rampal:

[RSE: Provide the frequency distribution of the likelihoods of all the trees that you have generated from which you picked the three trees. You can indicate these trees in the plot.]

I need to show this: which_ones

Then per setting, the errors.

richelbilderbeek commented 4 years ago

Here is the distribution:

likelihoods

richelbilderbeek commented 4 years ago

Awesomer:

likelihoods

richelbilderbeek commented 4 years ago

We switch back to using the nLTT statistic, as twinning works better with that statistic.

richelbilderbeek commented 4 years ago

Started run:

p230198@peregrine:pirouette_example_23 sbatch scripts/rerun.sh 
Submitted batch job 9508018

Start: 9:41, ETA: ?

richelbilderbeek commented 4 years ago

Hmm, I assigned this Issue to myself 11 days ago. Regardless, @Giappo worked on this Issue yesterday, judging by the Travis build logs. That's just fine, I'll re-assign him this Issue...

richelbilderbeek commented 4 years ago

I'll take this one :+1:

richelbilderbeek commented 4 years ago

I close this Issue. I use paper now instead :+1: