seananderson / covidseir

Bayesian SEIR model to estimate the effects of social-distancing on COVID-19
https://seananderson.github.io/covidseir/index.html
28 stars 11 forks source link

Extend modeling to incorporate VoC and vaccination #7

Open sempwn opened 3 years ago

sempwn commented 3 years ago

In order for the model to remain relevant into 2021 we need to consider how to incorporate the impact of vaccination and VoC. Both of these issues will require changes to the underlying ODE, and in addition will need external data on timing of VoC introduction (which may not be known and so would be specified with a prior), as well as vaccine distribution by age group. I have some ideas on how to proceed with both of these issues, but would like to hear any other thoughts especially as issues like lack of VoC data could lead to some degeneracy of the likelihood. Outlining the two ideas below:

VoC

Model can be converted into a two-strain model. This would include updating the ODE structure and duplicating both the social-distancing and non social-distancing exposed, infected and quarantined groups. Already produce code for this so that part would be straightforward. Initializing when the new strains are introduced can be achieved through the use of an importation vector which describes when VoC are imported on a day and has the same dimensions as the case data. The structure could be something like c(0,...,0,1,1,...,1) where 1 indicates VoC are being imported.

Vaccine

We could create an "all-or-nothing" vaccine as for the purposes of this model we are only considering cases and so do not need to consider differences between transmission-blocking and reduction in disease severity for a vaccine type. The main issue is that many jurisdictions are vaccinating by age which would have a differential impact on transmission for a given vaccine dose. A way to approach this is to create a function which converts the daily doses of vaccine by age group, and some age group contact rates and converts it into a contact-rate adjusted vaccination rate. The result of this function can then be incorporating as a stan data vector into the model, which updates the daily rate of vaccine. The model could have this vaccine category included or it could just be removed from the susceptible group.

seananderson commented 3 years ago

Thanks for pushing this forward.

VoC: Do we (and are we going to) have the data to fit to VoC? In a perfect world we'd fit them separately, but my understanding is we don't have the data yet and by the time we do the current VoC may be the dominant strains. The model could also allow a ramp in of a new R0 or just absorb it in the the 'f' value based on the initial R0. At this point I understood the plan was to simulate various VoC scenarios going forward, which it looks like you've got a good start on. Longterm, getting a good pure-R projection will be helpful for flexibility.

Vaccine: I guess another way of thinking about this would be as an effective number of people removed from the susceptible group based on age-based contact rate and age-based vaccination? The nice thing is that calculation could happen outside the model. Whether it's included as a separate category or just removed from the susceptible group (I don't see any major advantage to one or the other), we'll want to update the model to use the newer ODE function that allows ... for passing arbitrary data and parameters. Given everything is carefully set up to work as is with x_r and x_i, the fastest option in the short term might be to just switch the ODE function, keep everything else the same, and add the vaccination vector as a new data vector in the .... Long term the code could be much simpler without having to concatenate everything into x_r and x_i and then pull it back apart.

sempwn commented 3 years ago

VoC: I like the idea of leaving the model more flexible by incorporating the changes in transmissibility due to new strains by allowing the R parameter to change in time. One concern would be whether we would have any issues with identifiability with both R and f changing? Presumably if we keep to fitting at distinct breakpoints and the breakpoints for f and R do not overlap too much this shouldn't be an issue. We would need to have a ramp in as well given there's approximately two months between establishment and dominance of a strain as seen in other jurisdictions. So we could either create a vector of break-points similar to the distancing fraction or create two new priors specifying the time of establishment of VoC and time of dominance of VoC, I think the second approach would be more favorable

Vaccine: I agree a good way of thinking about it is in terms of effective numbers vaccinated, and has the advantage that we can do it all outside STAN and we just feed in a vector that represents the daily effective vaccination rate. I've already created functions that converts an arbitrary vaccination schedule into this that I can add directly into covidseir. One issue is with multiple vaccine products with a distribution of efficacy it would be hard to easily create a single vaccine efficacy parameter. We could view this as a distribution and create another prior for it so we're adding in some uncertainty on the total effectiveness of the vaccination program?

seananderson commented 3 years ago

VoC: Yes, since f is a fraction of R0, we'd definitely have trouble estimating it if they both changed at the same time. I imagine we'll want to ramp it in based on external knowledge and with a strong prior on the ratio of R0_VoC to R0 based on other studies. I think it doesn't matter all that much other than making the 'f' interpretation theoretically consistent.

Vaccine: I suppose you could also have an effective vaccine efficacy, which is the weighted average of efficacy across vaccines given to date. Ideally we avoid putting another estimated parameter into the ODE since it will slow things down more, so we should think carefully about what we need vs. what would be nice to have. Same for the VoC R0 ratio. As a first approximation and proof of concept the R0 ratio and efficacy could go in as fixed data. When that's working we could think about estimating one or both with added priors.

andrew-edwards commented 3 years ago

Vaccine: Just trying to clarify things (for my own understanding, but to make it a bit more concrete).

Looking at the schematic of the model in the first paper, wouldn't the vaccinated people just be moving from S to R (and S_d to R_d) at a rate that we know (so doesn't have to be estimated)? Agree with the above ideas that this should be estimated outside the model, and adapted if possible based on ages of people getting vaccinated.

So I was originally thinking it's just one extra parameter, let's say 'v' for the rate of vaccination (units of per day), which modifies the equations:

dS/dt = .... - v S
dR/dt = .... + v S 

dS_d / dt = .... - v S_d
dR_d / dt = .... + v S_d

But what we know is the absolute value v (S + S_d) each day. So instead of estimating v each day, why not just move the 'known' number of vaccinated people from S and S_d to R and R_d?

So maybe this is better: have V(t) as the number per day for which the effects of the vaccine kick in on day t (so based on the number that were vaccinated two or three weeks earlier, for which we have data). Then we need to partition into S and S_d, to give:

dS/dt = .... - V(t) S / (S + S_d)
dR/dt = .... + V(t) S / (S + S_d) 

dS_d / dt = .... - V(t) S_d / (S + S_d)
dR_d / dt = .... + V(t) S_d / (S + S_d)

Mike yesterday mentioned just assuming 100% efficacy of the vaccine. Reducing that would just mean reducing V(t), which is easy enough. Sorry if any of this is what you were thinking anyway, I just find it helpful to see the equations.

seananderson commented 3 years ago

@andrew-edwards, yes I think that's what we were thinking. Thanks for bringing in the equations. I think the only "estimation" mentioned above was @sempwn suggesting putting a prior on the efficacy to avoid assuming a precise value (I don't think the data would inform it). However, I'd favour it just going in as data (a weighted mean across vaccines) since having it as a parameter would slow the estimation down and (I think) it's one thing that we know fairly well and will know even better over time. Assuming 100% efficacy seems like a good starting point.

andrew-edwards commented 3 years ago

Great. Could even just define V(t) as the 'number per day for which the vaccine prevents infection on day t', so kind of absorbing the efficacy into the definition.

So if 100 people get vaccinated on day t-14, and we assume 90% efficacy and an exact two-week period for the vaccine to become effected, we'd have V(t) = 90 per day (and it would change each t). Data should be pretty good. If warranted, could add uncertainty to the exact 14-day value by using a distribution instead of 14 days, and again this could be done outside the model to avoid estimating extra parameters. Kind of another delay-distribution idea. Probably fine for now to just say 14 (or whatever) days.

carolinecolijn commented 3 years ago

Hi all -- sorry just jumping in now. I don't think we want S + Sd in a denominator. I think @andrew-edwards first set of equations is much more standard. The reason to have some time-varying rate of vaccination that we need to adjust is that the impact on transmission changes a lot with who is vaccinated. Right now we have not dented transmission at all, but when we start vaccinating high-contact folks we will likely change it a lot more.

I agree with @seananderson we should not try to estimate the efficacy. I agree with @sempwn suggestion that lower efficacy can be modelled with a reduced rate (eg an all or nothing vaccine leaves some portion susceptible, but completely protects others)

back to earlier points - I am actually not sure we need R to change with time. I think B117 is the main high-transmission strain we need to worry about. We should not have discrete break points like with f, and as @seananderson says we would lose identifiability if we did that. it would be interesting to estimate beta.b117/beta.normal (where these are the obvious two betas) , perhaps from the UK data (for which we already have the data set up and everything, from the leeway paper).. they also have regional data. That could then be an input into our local model, or we could estimate that parameter here.

@sempwn do we know whether there will be VOC numbers daily, to fit to, or whether we will continue with say 10% on this day, 1% on whatever day 4 weeks earlier ? I think this is better than a 0,0, 1, 1, importation vector because we will never know the importations, we don't have importations in the model now, for better or worse, and because even if we knew the importations they don't all take off and contribute to community transmission. Starting at a specified % at a specified time - like 1% on Feb 5, and maybe also 10% for the past weekend - will better capture the dynamics.

carolinecolijn commented 3 years ago

also can we add Elisha to this conversation? ElishaBayode is his github name

ElishaBayode commented 3 years ago

Re: vaccination - I think we should separate 'vaccine-induced immunity' and 'immunity due to previous infection' from the outset. i.e. dS/dt= ... -v(t)S, dV/dt= ... +v(t)S instead of combining both of those groups into one class (R). The dynamics of the two classes (V and R) are different. We can expect 'vaccine-induced immunity' to be more stable than 'immunity due to previous' infection, and could be less prone to immune escape. So we may want to have dR/dt= ... -rR, dS/dt= ... +rR, (where r is small), and just assume that vaccine provides long-lasting immunity (for >5 years). If this kind of modifications will not make things too complicated to incorporate into covidseir, I think we should add them now, so we don't have to change things again in the next few months. Separating both forms of immunity is even more important in a multi-strain model where immune escape is a high possibility.

ElishaBayode commented 3 years ago

Another change we might consider adding is a separate class for asymptomatic individuals: If E2 is pre-symptomatic individuals, then we can have dE2/dt =... -cE2, dI1/dt=...+ c*(1-b)E2, dI2/dt=...+ c*bE2 where I1 and I2 are for asymptomatic and symptomatic individuals, respectively, and b is the fraction that develops symptoms. So we have E2, I1 and I2 all contributing to the force of infection. I have the equations written down for all of these, I can share it if you want.

carolinecolijn commented 3 years ago

I realize that I am the one who suggested the asymptomatic class to Elisha but I think for continuity it may be better not to have this change now along with adding VOC and vaccination. The covidseir fits we have been doing have a duration of the I class (typically a mean of 4 days) that is intended to capture most people being symptomatic, noticing early and isolating, but some not having symptoms and not noticing. If we put in asymptomatics it will probably break the post2prior stuff or we'd have to translate the parameters from the original covidseir model to the new model to accommodate this difference in assumption. @ElishaBayode 's point about vaccine-induced immunity and natural infection-induced immunity being different, and about waning immunity, is well taken. That could make a demonstrable, large, difference within a few months if we have B.135 and/or P1 to which our vaccines have reduced efficacy.

@sempwn what do you think, in terms of what you are being asked to do with this model? Typically I wouldn't assume that vaccine-induced immunity is more stable than that induced by natural infection but I think @ElishaBayode is correct that there is some data to suggest that this is true for covid.

sempwn commented 3 years ago

Thanks everyone for this discussion and incorporating these ideas. It sounds like the plan on the two fronts will be:

Vaccination : Create as an "all-or-nothing" model which has a fixed data variable (so not a parameter) that incorporates the impact of vaccination on transmission. The vaccinated group should be a separate class, so model equations will look something like dSd/dt = ... - v(t) fsi and dVd/dt = v(t) fsi etc. The v(t) function itself is an adjusted vaccination rate that accounts for age-dependent vaccination rate, their relative contacts, and relative susceptibility.

VoC: Create a piece-wise linear function that approximates the rise and dominance of VoC among the circulating strains. We'll assume for now we are only considering B1.1.7 and its impact on transmission. We have a pretty consistent trend almost everywhere where it takes about 6-8 weeks for B1.1.7 to dominate, and a consistent increase in transmission of around 50%. We could then have an external data variable that indicates the time at establishment of B1.1.7 and fix the time to dominance and increase in transmission so as to avoid identifiability issues with the f-segment estimates.

I'll create a new branch this week to start working on it incorporating the code I've already created and prioritize the VoC component as the first step.