Closed dpenas-u closed 1 year ago
Yes this sounds like a good approach. Bear in mind that these are Gamma distributions, and therefore the mean is shape*scale
and the variance is shape*scale^2
. Unless you have a good reason to suspect that is not the case, I would keep the sampling time distribution the same as the generation time distribution, which is what happens by default if you do not specify the parameters of the sampling time distribution.
The mean of the gamma distribution presenting in your paper (TB experiment) comes to 0.39 (1.3 * 0.3). Does this equate to saying that the average generation time of TB is 0.39 years?
The paper states that we "used a Gamma distribution with shape parameter 1.3 and rate parameter 0.3" In other words this has shape 1.3 and scale 1/0.3=3.333 The mean is shape scale=1.33.333=4.33 years The 95% range is from 0.23 to 14.31 years
If I am working with a pathogen with a smaller generation time (say 2 months), should I select the parameters of the gamma distribution such that the mean is ~0.16 (2/12)?
Yes that's right, and if you also consider the variance you want around this mean then you can deduce both scale and shape parameters as explained previously in this thread.
Understood. I appreciate your help!
Hi Didelot,
I am using the infer_multittree function in the TransPhylo package in R to infer transmission in a group of Mycobacterium tuberculosis clusters from a specific region. However, I am not sure which distribution to use for the generation and sampling time distributions for all the clusters. To address this, I plan to run the inference individually for each cluster, varying the prior distributions for the generation and sampling time distributions. Based on the literature, I have observed that the generation and sampling time distribution ranges from 1 to 2.5. Therefore, I will try different shape values within that range and distributions with different tails (scale parameter). I will examine how well the posterior distributions fit with the prior distributions and use this information to obtain the best values for the generation and sampling time parameters (with higher fitting) for all the clusters.
My question is whether this approach is reasonable for estimating the generation and sampling time parameters. Once I have obtained the best values for all clusters, I plan to perform multi-inference with those values.
Thank you very much in advance for your help, and I apologize for any inconvenience.