Open giulianonetto opened 4 years ago
Hi @giulianonetto. Have you managed to sort this out? I'm facing the same situation.
Hi @jlopezper ! No, unfortunately, I have not. What I did was assuming a Gamma distribution nonetheless, as it is quite frequent even for COVID-19 studies.
The one thing about that paper, though, is that their reported mean and sd for the serial interval don't carry the same meaning as the ones reported by other papers that do not assume Gaussian distributions. For instance, if you have mean=3
and sd=3
, a Gamma has about 80% of its mass between 0 and 5, while a Gaussian has only about 50%. The discrepancy can be visualized in the plot below (blue is Gamma, black is Gaussian).
For now, I assumed SI ~ Gamma with mean 4.89 and sd = 1.48, which seems to span most previous estimates (listed in the paper which assumed the Gaussian). The figure below shows the params, quantiles, and the histogram of the distribution I am currently assuming (sorry I mean Serial Interval distribution in the plot title).
It is not perfect, but it has generated seemly good results with Brazil's data. Also, using method="uncertain_si"
didn't make much difference in my case.
Hope it helps! Any better solutions or alternative ideas would be very appreciated as well!
Thank you @giulianonetto for your detailed response!
I'm going with a Gamma distribution as well but with mean = 7.5 and sd = 3.4 (based on this and this). Comparing it with the Normal distribution proposed in that paper, both distributions overlap by about 50%.
Since I'm a complete novice in this matter, I'm not sure that this assumption is correct, although the results I'm obtaining with data from Spain seem reasonable.
Thank you again.
That's reassuring. We do believe our SI distribution might be a bit closer to zero than it should eventually be, but we wanted to partially follow the other studies cited here.
My only concern is that both studies you pointed out seem to be based on this one, which estimated the SI using only 6 infector-infectee pairs. Notice their confidence interval for the mean SI ranges from 5.3 to 19. I am not entirely sure whether that's just how it is, but it was one of the reasons we decided to be a bit skeptical about that particular study. In our case, it led to a most-recent Rt, using Brazil's data, of over 3, while the resulting R0 would be like over 5 - which seems a little above average (see table 1). Of course, that can be my personal bias again, these are just the decisions we made in such an uncertain scenario.
Thank you very much for your response!
It is important to know that we are not alone in all this, and that our doubts are shared.
Hi both,
Thanks very much for using EpiEstim :-)
Happy to think more about this, but my immediate thought it that implementing negative serial intervals in this method is likely to be challenging, since it will involve summing over numbers of cases both in the past and in future? As a result, doing real-time estimation would be hard (given unknown numbers of future cases)?
Another issue will be that the method assumes that individuals cannot generate new cases on the same day (i.e. the serial interval cannot be zero). That allows the renewal equation model to predict the number of cases today based on cases on all previous days - thereby allowing estimation of the reproduction number.
The thing I'm not sure about is whether these assumptions can be relaxed... To do that, it would be necessary to work through the underlying method here (https://www.sciencedirect.com/science/article/pii/S1755436519300350) and adapt it, but I suspect this would be a substantial adaptation (and perhaps worthy of a paper in its own right!) It would certainly be more straightforward to simply implement the gamma distribution you mentioned, if you think that isn't too dubious an assumption.
I will think more about this, but wanted to share my initial thoughts!
Thanks! Robin
Hi Dr. Thompson!
Thank you very much for your response. It seems to me that reporting negative-valued serial intervals rises from a limitation of SI as a proxy for the time between infection (TBI) events, as shown in the figure below.
The lines start when the person is infected, and arrowheads show when the person starts feeling the symptoms.
In the "normal" case, the SI serves fine as an approximation to TBI events. If the infector takes a bit longer to notice the symptoms though, while the infectee feels them more quickly upon infection, the serial interval becomes a negatively-biased estimator for the TBI events. Of course, infectee can never be infected before infector, despite such a "lack of synchronization" being quite possible in terms of symptom onset.
It feels like negative-valued SIs still carry important information to be simply thrown out - this should not happen so often if there is no asymptomatic transmission for instance. It might indicate that the TBI is in fact not super high. However, I wonder if there is a way to account for such a negative bias? Maybe using incubation period estimates to correct them in some way?
Sorry if this is pure nonsense, I am truly interested in digging deeper.
Thank you all very much!
Hi, Yes - you are right. In theory it is possible to back-calculate infection times from the times of symptom appearance. And then use the inferred times of infection for your analyses. I know some work has been done in this direction before, but I imagine it leads to considerable uncertainty (given the width of the incubation period). Might be worth doing more of a literature search in that direction - from memory, I think Christophe Fraser talks about back-calculation of infection times in his paper "Estimating Individual and Household Reproduction Numbers in an Emerging Epidemic". Thanks! Robin
Note the most up-to-date preprint of the negative SI distribution was posted on medRxiv on 27 April.``
Dear @giulianonetto,
Are you currently using open-access epi data? Does it measure the date of notification or onset? I'm currently searching for date of onset data.
Kind regards, Tim.
Hello, Dr. Cori!
First of all, thank you for this amazing package. It has been super helpful - not only with modelling itself but also with learning epidemiology concepts. I am no expert, so excuse me for any misunderstandings.
Model SI as Gaussian - COVID-19 report
This recent preprint has reported negative-valued serial intervals, which I understand as infectee showing/noticing symptoms before infector. For this reason, the authors departed from a Gamma model for the SI to a Gaussian model. So being able to assume a distribution on the entire real line for the SI seems like a good feature in
EpiEstim
.Failed attempt:
While I was able to use
estimate_R
(withmethod='parametric_si'
) with their reported mean and sd for the SI, I noticed it uses a discretization of a gamma distribution to set the probabilities at each SI value.Given the importance of COVID-19 and the evidence of negative-valued SIs, I wonder if it would be possible to follow
EpiEstim
strategy but assuming$$SI \sim N(\mu, \sigma^2)$$
instead of
$$SI \sim Gamma(\k, \theta)$$
I actually tried to override
EpiEstim::discr_si
with a naive implementation for the discretization of a Gaussian r.v.:but my results were totally incorrect.
Sorry for the long description.
Thank you very much!
Best wishes
Giuliano