update parameters for SARS-CoV-2 generation interval

papsti commented 1 year ago

parameters we grabbed from EpiNow2 yield a near-exponential generation interval pretty regularly, which isn't realistic

i sent a message to the RISK knowledge synthesis folks about (Omicron) generation interval parameters, but i don't think they have aynthing

i also sent a message to Daniel Park, and he had a rather helpful response:

I feel like there’s a fair amount of uncertainty with Omicron generation intervals because people don’t really account for different factors that can affect GIs. There’s also an issue with subvariants….

There’s a dynamical bias, meaning that we’re more likely to observe shorter intervals during the growth phase. So it messes up the relationship between serial and generation interval depending on how you cohort things. We (me, Jonathan, and many others) tried to explain here and showed that Delta and Omicron might have similar GIs in contrast to other people suggesting that Delta has longer GIs: https://www.medrxiv.org/content/10.1101/2022.07.02.22277186v1.full.pdf. Mean GI 3 days.

But our estimates are realized GI. Instead this paper tries to estimate intrinsic GI from realized GI: https://www.sciencedirect.com/science/article/pii/S2666776222001405. Mean realized 3.6 days and mean intrinsic 6.9 days. I haven’t looked into the methods into detail but at least they have a ton of data. But note that we estimated shorter realized GI (3 day) than SI (3.5 day). Whereas they estimated a much longer realized GI (3.6 days) than SI (2.4) days. So there’s a clear methodological gap between the two.

There are two more papers but they’re relying on some sort of population-level data rather than actual transmission interval data so much less reliable.

look through this stuff and use it to come up with better generation interval estimates.

papsti commented 1 year ago

dC's reply on teams:

Indeed, Daniel's message is helpful. Also is one of the leading experts on this stuff. The Manica et al study that he refers to seems a good reference. Data is Italian households, so may not be that different from Canadian ones... So, unless you have a better source, could you please set ern's generation interval distribution to the one they have in the Manica paper? Figure 2A and uncertainty bounds driven by Table 2. Be careful to use their data estimating the INTRINSIC generation interval, not the "realized" one.

papsti commented 1 year ago

the Manica et al paper gives a distribution mean, shape mean, and scale mean + 95% credible interval bounds on each mean for both the incubation period and the intrinsic generation interval distributions, under the assumption that these times are both gamma distributed.

i can certainly work out the corresponding standard deviation + 95% CrI bounds for a gamma from the shape and scale parameters. what i'm having trouble with is figuring out how to sample a distribution's parameter from just a mean and 95% CrI bounds for that parameter in order to specify a distribution to use within each replication of the Rt calculation.

the authors specify that they used a "non-parametric" approach in estimating these parameter means and credible intervals, but i can't find any other detail. dC has said that since they're using a Bayesian framework, there should be posteriors available for all of these parameters, and then we could simply sample from these, but i can't find them...

i've emailed ben bolker for a quick chat about this and we should be meeting later today.

papsti commented 1 year ago

ben suggested fitting distributions for the mean and the scale of parameter independently using the credible interval bounds as 2.5% and 97.5% quantiles, which is basically what @davidchampredon suggested. however, ben pointed out that for a gamma, the shape and scale are likely more strongly correlated than the mean and shape, and so it would be safer to perform two independent 1D optimizations with the latter two parameters. otherwise, one should instead perform a 2D optimization over shape and scale simultaneously.

there are four parameters for which we are trying to find a (gamma) distribution to draw from:

incubation period mean
incubation period shape
generation interval mean
generation interval shape

for each of these parameters, we have the distribution mean and 95% credible interval. fix the mean as if it's the truth, and optimize over the unknown shape parameter. (once we have that in hand we can recast things in terms of a mean and standard deviation, to fit in our current framework for specifying distributions.)

some sample code to accomplish this task is as follows:

fn <- function(s, m = 3.49, ci_target = c(3.17, 3.77)) {
shape <- s
scale <- m/s
ci <- qgamma(c(0.025, 0.975), shape = shape, scale = scale)
return(sum((ci-ci_target)^2))
}

optim(par = 1, fn = fn, lower = 0.0001, upper = 1000,
method = "Brent")

alternatively, one could assume each parameter is log-normally distributed and optimize over the sd.

i can code up methods for both and see which one returns a closer fit to the stated credible intervals.

this computation wouldn't need to be run each time the pipeline is run, just once for a given mean and credible interval bounds pairing to get the built-in distribution parameters we will ship with the package. however, i want to make sure we document this, so i will write it up as an .Rmd and we can keep it in a notes subdirectory (that we don't push to CRAN if we go that route).

phac-nml-phrsd / ern

update parameters for SARS-CoV-2 generation interval #39