xavierdidelot / TransPhylo

Reconstruction of transmission trees using genomic data
http://xavierdidelot.github.io/TransPhylo/
GNU General Public License v2.0
60 stars 22 forks source link

Choosing a plausible Gamma prior #8

Closed phiweger closed 5 years ago

phiweger commented 5 years ago

Hi,

We have run TransPhylo w/ a couple of different Gamma params and looked at convergence, the predicted transmission events, their dates as well as the sampling density. Of course, choosing different priors yields very different results.

I wonder whether you have any advice on how to perform some sort of sensitivity analysis, or how to choose appropriate Gamma params (shape, scale)?

For example, we have epidemiological data for a hospital outbreak on which patient was on which ward w/ dates. This guides us somewhat on parameter choice, because we can easily spot when the model generates predictions that deviate a lot from this data (like patients being infected before being admitted to the hospital).

Also, we tried fitting a Gamma distribution via maximum likelihood to the minimum pairwise distance between tips of the dated tree. But this does not seem to be the right thing either.

Any advice would be greatly appreciated.

xavierdidelot commented 5 years ago

Dear Adrian,

It may be a good idea to start with a shape parameter equal to one, in which case the Gamma distribution simplifies into an Exponential distribution. Then you only need to specify the mean of this Exponential distribution using the scale parameter.

As you noted the results will be different depending on which value you use for the scale parameter, with more intermediate cases being inferred when the scale parameter is smaller, to explain the gaps between sampled cases in the transmission chains. In some situations you may have a good prior idea for the expected mean of the generation time. In others you may have a good idea of how many or how few unsampled cases to expect, and can therefore try to find the mean generation time that gives you the expected value for the parameter pi. Or in your case you have information about patient entrance and exit from the wards, which informs your expectation for the transmission links and it is therefore acceptable to try and find the scale parameter that best captures your expectation.

The approach you described in your last paragraph does not seem correct though, unless you are convinced that your sampling density is very high, in which case it would be better to use the value of pi as a guide as described above.

Best wishes, Xavier