xavierdidelot / TransPhylo

Reconstruction of transmission trees using genomic data
http://xavierdidelot.github.io/TransPhylo/
GNU General Public License v2.0
60 stars 22 forks source link

Approach to select appropriate priors for generation and sampling time distributions #30

Closed dpenas-u closed 1 year ago

dpenas-u commented 1 year ago

Hi Didelot,

I am using the infer_multittree function in the TransPhylo package in R to infer transmission in a group of Mycobacterium tuberculosis clusters from a specific region. However, I am not sure which distribution to use for the generation and sampling time distributions for all the clusters. To address this, I plan to run the inference individually for each cluster, varying the prior distributions for the generation and sampling time distributions. Based on the literature, I have observed that the generation and sampling time distribution ranges from 1 to 2.5. Therefore, I will try different shape values within that range and distributions with different tails (scale parameter). I will examine how well the posterior distributions fit with the prior distributions and use this information to obtain the best values for the generation and sampling time parameters (with higher fitting) for all the clusters.

My question is whether this approach is reasonable for estimating the generation and sampling time parameters. Once I have obtained the best values for all clusters, I plan to perform multi-inference with those values.

Thank you very much in advance for your help, and I apologize for any inconvenience.

xavierdidelot commented 1 year ago

Yes this sounds like a good approach. Bear in mind that these are Gamma distributions, and therefore the mean is shape*scale and the variance is shape*scale^2. Unless you have a good reason to suspect that is not the case, I would keep the sampling time distribution the same as the generation time distribution, which is what happens by default if you do not specify the parameters of the sampling time distribution.

sriram98v commented 5 months ago

The mean of the gamma distribution presenting in your paper (TB experiment) comes to 0.39 (1.3 * 0.3). Does this equate to saying that the average generation time of TB is 0.39 years?

xavierdidelot commented 5 months ago

The paper states that we "used a Gamma distribution with shape parameter 1.3 and rate parameter 0.3" In other words this has shape 1.3 and scale 1/0.3=3.333 The mean is shape scale=1.33.333=4.33 years The 95% range is from 0.23 to 14.31 years

sriram98v commented 5 months ago

If I am working with a pathogen with a smaller generation time (say 2 months), should I select the parameters of the gamma distribution such that the mean is ~0.16 (2/12)?

xavierdidelot commented 5 months ago

Yes that's right, and if you also consider the variance you want around this mean then you can deduce both scale and shape parameters as explained previously in this thread.

sriram98v commented 5 months ago

Understood. I appreciate your help!