Closed seabbs closed 1 year ago
I'm not sure that we really need a dynamic model. When there's good individual-level data, it's almost always better to rely on truncation+censoring than growth-rate-based adjustments (estimating GI from SI is one exception, which I think might need separate care). We would want to rely on dynamical adjustments when such information is not available (e.g., doing a meta analysis on bunch of previous estimates or like what we did for Omicron), in which case time-varying growth rate information probably isn't so useful (we don't have good individual-level data, so we don't know how to cohort them, so how can we use time-varying r?)
There is some need to work out a non-uniform prior version for censored intervals, but using something like epinowcast isn't going to fix it because we're dealing with changes within a day, meaning we might need some sort of additional interpolation from discrete time incidence to continuous time incidence (seems out of scope for v1).
I do agree that we need some sort of dynamical adjustment method for the paper. It's on my todo list for the weekend... I will circle back very very soon...
better to rely on truncation+censoring
This is the reason to need it right? To show this clearly.
Agree in generation time estimation being its own thing.
Also agree it doesnt help the within a day issues and ideally we would have a continuous time version. I do think this should be a forward (i.e generative) Vs post processing model ideally.
This is the reason to need it right? To show this clearly.
I agree. But don't think we need something too complicated to show this. Let's see what I come up with and decide...
I set up the growth rate version using brms here:
And tested it here: https://github.com/parksw3/dynamicaltruncation/blob/main/scripts/test_dynamical_brms.R
Seems to give sensible results but is pretty slow (probably because of the integration step). Post-processing would be much much faster in this case... Going to see if I can work out a discrete-time version with time-varying growth rate tomorrow (might be faster or slower)... discrete sums should at least make things faster than integration?
Also a bit confused by this
I was wondering if we should just use epinowcast as in the baseline case it should be the L2 from Shaun's paper (see https://github.com/parksw3/dynamicaltruncation/issues/19).
Because L2 is not what I think of when I'm thinking dynamical correction. Instead, it's doing equation 2 backwards in https://www.pnas.org/doi/full/10.1073/pnas.2011548118. Equation 3 is the special case, which assumes a constant growth rate.
I would say L2 is just another method... I think what Jonathan and I compared here is probably some version of L2 vs dynamical correction (see section 7.4)? https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2019.0719
So I think that L2 is a dynamic correction in the sense that it uses transmission dynamics to correct for right truncation but agree that it isn't quite what you have been talking about when you discuss dynamic correction (i.e as essentially a "correction" vs a joint model).
That being said I think it is functionally equivalent and appeals to me from a Bayesian/generative model view of the world.
In terms of the epinowcast
model being complex? It really isn't if restricted to only the simple Poisson fixed lognormal case (as it is L2).
think that L2 is a dynamic correction in the sense that it uses transmission dynamics to correct for right truncation
I agree. But I feel that L2 is less explicit about the dynamic than what I do.
as essentially a "correction" vs a joint model
I disagree because it is possible to write down a joint model for the backward corrections with join model. Essentially, what I do depends on the convolution between incidence and delay distribution, where the incidence can be modeled with a generative model. So far, I've been doing exponential "corrections" for simplicity but when the growth rate is changing, corrections are no longer possible. Still working on this though.
functionally equivalent
I'm not so sure about this... I guess they should turn out to be equivalent but two methods emphasize different bits. I think L2 is just another way of modeling truncation, whereas the dynamical approach I talk is trying to explicitly link forward vs backward distributions. Backward distribution is similar to the truncated distribution (equivalent when r >0) but also different because backward distributions can have longer means than the forward distribution.
Anyway, not at all disagreeing with using the joint model or implementing L2, but just trying to clarify the differences.
Will try to finish my backward stuff today...
I agree. But I feel that L2 is less explicit about the dynamic than what I do.
😆 I think in my head it is the reverse but perhaps that is because I am coming at this from nowcasting.
To demonstrate the use of the epinowcast
model (I think we should expect to see a slight bias due to the use of a discrete model and potentially because long delays are internally truncated in epinowcast
(which here is not ideal). The advantage of this is of course it can handle time-varying growth rates out of the box (though of course how well needs to be determined).
https://github.com/parksw3/dynamicaltruncation/blob/main/scripts/test_dynamical_epinowcast.R
parameter mean median q2.5 q5 q20 q35 q65 q80 q95 q97.5
<fctr> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: meanlog 1.850 1.850 1.740 1.760 1.800 1.820 1.870 1.90 1.960 1.990
2: sdlog 0.451 0.448 0.389 0.397 0.419 0.434 0.461 0.48 0.518 0.534
3: mean 7.080 6.990 6.210 6.340 6.610 6.800 7.190 7.45 8.050 8.380
4: sd 3.390 3.290 2.560 2.650 2.920 3.110 3.490 3.77 4.400 4.750
Simulated using r = 0.2
and
r$> secondary_dist
meanlog sdlog mean sd
<num> <num> <num> <num>
1: 1.8 0.5 6.855149 3.653385
I disagree because it is possible to write down a joint model for the backward corrections with join model
Agree its possible but it does feel like a slightly more convoluted joint likelihood (as you need a growth rate likelihood based on counts and another for the correction). I am not saying the direct approach is preferable etc - just perhaps more "generative" (i.e as a forwards process).
You are right it may not be entirely equivalent (though reading Shaun's paper and earlier survival literature makes me think it is). L2 is definitely just another way of modelling truncation but doing so with a dynamic process. I agree that there is a more general point about forward and backward distributions but does that have bearing on this particular case?
Will try to finish my backward stuff today..
Really excited to see this.
I don't have strong feelings on what exactly we include here but definitely just need something that has a "dynamic" (i.e linked to growth rate) based approach vs straight up truncation - regardless of what.
as you need a growth rate likelihood based on counts and another for the correction
Ah, no we don't. Maybe it's because I keep talking about the dynamical correction without writing anything down... coming soon...!!!!
So this would be inferring the growth rate based on the amount of truncation we see? That makes sense.
No. just a convolution between forward distribution and incidence. No need for growth rate at all.
So just need to feed in incidence and do a convolution, which I'm having trouble with right now (trying to be efficient without having to calculate the denominator over and over)...
aha okay!
Took me longer than I thought but it seems to be working. Basically, it models backward distributions as a convolution of the forward distribution and primary incidence, meaning that we need some sort of time series for primary incidence, which may require independent data set or could come from the delay data directly. Using independent data could be better in some sense because the delay data might not accurately represent case patterns (e.g., changes in contact tracing efforts).
This is the fit to the outbreak data, which is pretty cool. All of the patterns can be captured by convolution between incidence and a fixed forward distribution. Nothing else (e.g., splines) needed. See test_dynamical_full.R
.
Speaking of requiring incidence, combining some of this theory with nowcasting methods could turn out to be a powerful approach. The backward theory is essentially a general case of James Hay's viral load work so... if it's possible to use viral load to inform incidence, it should be possible to use changes in backward delay distributions to inform changes in incidence as well (and changes in incidence also informs changes in delays)...
This is cool. Am I right in thinking this is really a reformulation of L3 i.e the delay likelihood conditional on primary event times? Rather than the joint approach discussed in the Seamen et al. paper (i.e L1 and L2) that they describe as dynamic? What I am getting at here is is there a bit of a difference in terminology going on.
If this essentially an individual level version of the count level convolution model (i.e something like $S_t \sim \mathcal{Poisson}\left(\int^t0 g(\tau) P{t - \tau} d\tau \right)$)? I assume this isn't quite the same as it has no precise linkage between primary and secondary events.
Of course, the non-aggregated version has the added advantage that you can use a different incidence count dataset to the one used for estimating the delay (though if you do that its going to be hard to know if it is helping with bias or introducing others).
I looked at this with an earlier sample (truncated at 30 days). Is the small underestimation likely coming from issues with censoring?
2 lambda 4.73 4.73 0.0357 0.0358 4.67 4.79 1.00 1602. 2314.
Speaking of requiring incidence, combining some of this theory with nowcasting methods could turn out to be a powerful approach. The backward theory is essentially a general case of James Hay's viral load work so... if it's possible to use viral load to inform incidence, it should be possible to use changes in backward delay distributions to inform changes in incidence as well (and changes in incidence also informs changes in delays)...
Yes this is interesting. As above I am not entirely clear how this would end up differing to traditional nowcasting approaches where the reconstructed delay distribution is already serving this purpose. Keen to be convinced/explained to like I am five though!
(https://www.pnas.org/doi/full/10.1073/pnas.2011548118 going back for my third read..)
Am I right in thinking this is really a reformulation of L3 i.e the delay likelihood conditional on primary event times?
L3 doesn't seem to depend on the dynamic at all to me. It looks like L3 is just a regular truncation likelihood? I don't see L1 or L2 being much different to be honest. The difference is that L1 and L2 are treating primary event time as unknown, whereas we mostly know them (besides censoring). In this case, L1 = L3, and L2 is just a different way of writing the truncation problem.
I assume this isn't quite the same as it has no precise linkage between primary and secondary events.
There is (though it's not renewal). Let $P(t)$ and $S(t)$ represent primary and secondary incidence. And $f_p(\tau)$ being the forward distribution for cohort time $p$ and $b_s(\tau)$ be the backward distribution. Then, we have (see around equation 1 in my paper...)
$$ P(t) fp(\tau) = S(t+\tau) b{s+\tau} (\tau) $$
I looked at this with an earlier sample (truncated at 30 days). Is the small underestimation likely coming from issues with censoring?
I don't think so. I changed the simulation code to use Poisson so no censoring. In describing the backward equation, we need $P(t)$. But if you truncate based on having to observe both primary and secondary events, $P(t)$ is getting truncated, so the earlier sample code is probably feeding in the wrong $P(t)$.
As above I am not entirely clear how this would end up differing to traditional nowcasting approaches where the reconstructed delay distribution is already serving this purpose.
I'm not yet sure either. I have to spend more time understanding nowcasting approaches for now... I couldn't figure out how to get the stancode for epinowcast...
L3 doesn't seem to depend on the dynamic at all to me. It looks like L3 is just a regular truncation likelihood?
Well that is the point isn't it - it is conditioned out (i.e I agree with your point about what sets L1 and L2 apart from L3 and that L2 is just the aggregate form of L1).
There is (though it's not renewal).
I agree you setup makes sense but slightly struggling with why this is not the case in the absence of truncation?
In your case, it is also conditioned on the primary event happening but via the cohort perspective. Isn't this truncation adjustment via another angle (i.e mathematically equivalent)? Or is it not?
I don't think so. I changed the simulation code to use Poisson so no censoring.
Ah okay - makes sense. Do you have thoughts to generalise this? Or to account for observation error in the primary event?
I'm not yet sure either. I have to spend more time understanding nowcasting approaches for now... I couldn't figure out how to get the stancode for epinowcast...
Good luck it is intense. You might find the model definition a bit more useful: https://package.epinowcast.org/articles/model.html
or its essentially a version of L2.
https://github.com/epinowcast/epinowcast/blob/main/inst/stan/epinowcast.stan
slightly struggling with why this is not the case in the absence of truncation?
Am confused what you mean by this. What do you mean by "this"? what is not the case?
Do you have thoughts to generalise this? Or to account for observation error in the primary event?
Generalizing for other distributions is not difficult. But continuous distributions require continuous incidence, so maybe difficult. I guess error in the primary event incidence can be dealt with more easily. Large interval censoring can be dealt with but very difficult (I know what to do theoretically and conceptually but not sure how to implement yet).
Just circling back to this I think the version that accounts for observation error in the primary incidence is effectively the "traditional" joint nowcasting model (i.e a latent primary event model and observations for both primary and secondary events).
Pinning this as a to do as we need to discuss what parts of this we want to include and where/how
I think we have currently landed on discuss this but don't explicitly include? (similarly for the other dynamic adjustments that assume primary incidence are known?)
It does seem like a shame to miss out your cool work convolving primary incidence though (https://github.com/parksw3/dynamicaltruncation/issues/24#issuecomment-1306450576) so perhaps we can add a section once the current paper is drafted?
Yes, I think we can come back to this later after we write about other parts. I think it's at least worth talking about some of this backward vs forward bias because people are not careful about where their delays are coming from.
So I figured out how to do backward correction with continuous delay + discrete incidence (also possible to include censoring) but running it with brms
seems impossible because we need to pass two different data structures... maybe keep this one with stan and leave it as a proof of concept?
I think I could hack it even further to get it to work with brms by using make_stancode
and then sampling the model separately but not sure if that's all worth it?
So this is the figure that I'm sticking with for now. This is allowing continuous-time delays and censoring in delays. But I think there's room for improvement (in terms of both visuals and simulations). In particular, it'd be interesting to test this at different time points and also depending on whether we have truncated (real time) primary case data vs untruncated (retrospective) case data. In theory, we need untruncated time series but it might work reasonably with truncated time series? not sure. Might try to do this later but need to take a break...
It starts to sound a lot like a nowcasting problem. A longer term goal could be to try to see how we can use changes in backward delays to improve nowcasting estimates. Maybe it already does it but with a different framing... or it doesn't. I have to spend more time reading about nowcasting. But this is just like James Hay's viral load problem...
And here's the new figure...
So I figured out how to do backward correction with continuous delay + discrete incidence (also possible to include censoring)
Amazing!!!
seems impossible because we need to pass two different data structures
There are a few ways around this but yes I can see in general we may just be happy keeping it as a prototype. I might take a look and see if I can think of anything just to have some fun.
So this is the figure that I'm sticking with for now. This is allowing continuous-time delays and censoring in delays. But I think there's room for improvement (in terms of both visuals and simulations). In particular, it'd be interesting to test this at different time points and also depending on whether we have truncated (real time) primary case data vs untruncated (retrospective) case data. In theory, we need untruncated time series but it might work reasonably with truncated time series? not sure. Might try to do this later but need to take a break...
Haha so I aim hearing we should include it in the main analysis 😆 (this tbh wouldn't be the worst idea in the world as currently its a bit hard to see how it does bias etc wise vs the other methods we have).
It starts to sound a lot like a nowcasting problem. A longer term goal could be to try to see how we can use changes in backward delays to improve nowcasting estimates. Maybe it already does it but with a different framing... or it doesn't. I have to spend more time reading about nowcasting. But this is just like James Hay's viral load problem...
So I need to dig into what you have done but from my point of view (as I like to reiterate in an uniformed way as often as possible) I would expect it to be very tightly coupled with the nowcast models we have been using which are also very similar to JH viral load models. Pretty exciting stuff!
Another way to model this without explicit truncation would be to use a Poisson likelihood and try and infer the number of secondary events reported on a given delay for each primary event date (i.e primary_cases * probability of report). I think you looked at that previously and that would be exactly equivalent to our current nowcasting approaches except that in our case we assume the primary incidence is unknown and hence also have a model for this. (we also just use a negative binomial but that is not mathematically correct/justified).
The backwards brms
version looks like it is close to being there. It might be easier to make the model less efficient by doing the denominator adjustment for every observation. This would mean mean you could easily model across time-series etc and get away with just a custom likelihood (it could be much slower though).
I've had a play with this and it looks good to me. I've mostly got the brms
version working but I think there is an indexing issue I have just run out of time to sort out.
It starts to sound a lot like a nowcasting problem. A longer term goal could be to try to see how we can use changes in backward delays to improve nowcasting estimates. Maybe it already does it but with a different framing... or it doesn't. I have to spend more time reading about nowcasting. But this is just like James Hay's viral load problem...
After looking at this I agree we should think more about this.
If we can get the brms version to work perhaps we should put this in the main stream analysis? I am happy to do the grunt work to get that to happen if we think its a good idea.
Sorry, just woke up. Will get to these (and review your code) soon and try to finish writing up today and tomorrow.
I would expect it to be very tightly coupled with the nowcast models we have been using which are also very similar to JH viral load models. Pretty exciting stuff!
I agree. I would like to sit down and go reading about nowcasting more when we're "done" with this.
Another way to model this without explicit truncation would be to use a Poisson likelihood and try and infer the number of secondary events reported on a given delay for each primary event date
Ah yes, I think the L1, L2, L3 paper talks about this approach as well. But they also say that this is equivalent to L1. So maybe better to keep the paper simple and focus on fitting distributions for now.
It might be easier to make the model less efficient by doing the denominator adjustment for every observation.
I think this was my concern---it's going to be so much slower if we allow this.
I've had a play with this and it looks good to me. I've mostly got the brms version working but I think there is an indexing issue I have just run out of time to sort out.
Will take a look soon and try to fix it. But tempted to keep the non-brms version for speed. I think I tried calculating the denominator for every observation for the exponential case and it turned out to be super slow.
If we can get the brms version to work perhaps we should put this in the main stream analysis
The comparison is a little bit unfair because the backward model uses more information. If there's good individual-level data, I would recommend fitting forward distributions rather than backward distributions in most cases. Also, we need untruncated time series of primary incidence for the backward fitting, which makes the comparison even less fair?
(messaged on epinowcast too but leaving it here as well just in case). Looks like you fixed it while I was trying to understand the code. And it's running well on my end.
I guess I didn't know that you had to write data stanvar separately (instead of what I did before). I guess this means we can delete scode_data
argument from the function?
Also thanks for fixing the indexing issue. I was running into the same issue when I was using pad_zero so I changed it to drop_zero but looks like line 362-363 is doing some magic... I will work on writing the paper for the rest of the day.
(messaged on epinowcast too but leaving it here as well just in case). Looks like you fixed it while I was trying to understand the code. And it's running well on my end.
Ah yes sorry - I was keen to run it!
I guess I didn't know that you had to write data stanvar separately (instead of what I did before). I guess this means we can delete scode_data argument from the function?
So I had forgotten this and its totally not what I understood from the brms
docs. I looked back at an old project where I did this and had to do the same thing there. We could pass in the bit data block I guess and perhaps we should to keep consistent with what we have done everywhere else.
I will work on writing the paper for the rest of the day
❤️ ❤️ ❤️
Ah yes, I think the L1, L2, L3 paper talks about this approach as well. But they also say that this is equivalent to L1. So maybe better to keep the paper simple and focus on fitting distributions for now.
Yes agree. This model is the closet to what we do in a "classic" nowcasting model currently I think.
The comparison is a little bit unfair because the backward model uses more information. If there's good individual-level data, I would recommend fitting forward distributions rather than backward distributions in most cases. Also, we need untruncated time series of primary incidence for the backward fitting, which makes the comparison even less fair?
So I agree we don't want to recommend it in most cases but isn't that precisely why we should include it? The extended section can then be about how much this method improves if you have a complete understanding of primary cases (as it does now) and make the point this would be equivalent to knowing the growth rate was going to be constant etc? If we are going to run it for multiple distributions and time points in the outbreak model it seems like its only very little more to have it in the main analysis? If we wanted that though we would need to rewrite the function so that data_cases
could be estimated using the input data if not otherwise supplied (so as to work in our current code without changes).
We could pass in the bit data block I guess and perhaps we should to keep consistent with what we have done everywhere else.
Sounds like a good idea. What I did wasn't working anyway.
Maybe let's leave it like this for now? We can discuss it on Wednesday and we can try to finish the paper as much as possible meanwhile.
I will work on writing the paper for the rest of the day
after I come back from dinner though... haha...
Sounds like a good idea. What I did wasn't working anyway.
I jut pushed a version that does this.
Maybe let's leave it like this for now? We can discuss it on Wednesday and we can try to finish the paper as much as possible meanwhile.
Yes I agree. Lets get a draft as is and we can change as we feel like after that
Just flagging but we need to make sure its clear that retrospective in this case is different to retrospective in the ebola case study (in that it only knows about retrospective incidence and not the retrospectively observed secondary events).
Also noting that at some point we should really add this code to the end of the current targets file. I am happy to do that when I turn all the figure code into a standalone targets pipeline of its own (to clean up the current code and make everything nice and reproducible).
Sorry, I got carried away with another meeting and trying to re-make figure 4 and got barely any writing done. But I have nothing else on my schedule for Tuesday so will work on writing this paper (and no other projects).
No worries! I have some stuff on in the morning and then will work on this.
For the figure here we need to straify for the standard deviation (that is making the plot look strange at the moment).
Because we are treating this as its own analysis we are not capturing if this method outperforms a non-truncation adjusted censoring model in real-time (i.e is it better than nothing). That feels like quite a large missing link. We could include that here but again I think that motivates adding this to the main text for real-time analysis and then doing the retrspective analysis here to focus on if the incidence/growth rate is known.
I had a go at integrating into the main analysis in #45. Unfortunately there are a few (6) fits that fail due it issues initialising the stan model. These are all settings where the growth rate is less than 0 and generally where delays are long. I think this means there is numeric instability due to the padding of cases to be greater than zero but increasing this didn't improve performance.
We could include that here but again I think that motivates adding this to the main text for real-time analysis and then doing the retrspective analysis here to focus on if the incidence/growth rate is known.
Good point.
These are all settings where the growth rate is less than 0 and generally where delays are long.
Will take a look at these after I spend time writing.
Will take a look at these after I spend time writing.
I don't think we need it in now (so totally go ahead and focus on write-up) but I think we will end up with it once others have read it etc.
I'm so confused... I spent a lot of time trying to figure out what was wrong with the brms implementation, but it turned out there was a typo in the stan version (where wrote cdf instead of lcdf). But it looks like this wrong version is performing better..?
errr well that is confusing....
This is better performance on the dynamical comparison right? Perhaps this is just a coincidence and the issue was introduced in the script that does that instead of in the model itself?
I went back to see previous commits (https://github.com/parksw3/epidist-paper/blob/5931f277562774402fc93e85133c3c1ffed02ca2/data/models/lognormal_dynamical_full.stan) and it looks like there wasn't an error in the model when I first made the backward figure. I must have introduced the error while debugging?
Then, now, I'm even more confused why the brms model is not giving a good result because it looks identical to the stan model... maybe something with the simulation setup. I'll do more digging when I get home.
Also after spending more time staring at the code and doing thought experiments, I found out that censoring in the backward model is missing a term so I'll also implement that later today. But this change will probably only make minor differences.
I seem to have fixed it, but I can't run the full simulations because it takes forever to run... I had to give up running the full simulation so I can keep testing the code and doing other things... but two main issues were:
1) padding zero seems to introduce bias. Dropping zero is much better.
2) This needed to be real instead of int.
array[tlength] int log_cases;
But maybe I broke it again... it's giving me a ton of divergent chains now... It was working before...ugh
Turns out I introduced another error while fixing it, which was making everything slow and giving terrible answers. It's fixed now and seemingly giving good results. Time to try running all again.
Looks much better!
Also after spending more time staring at the code and doing thought experiments, I found out that censoring in the backward model is missing a term so I'll also implement that later today. But this change will probably only make minor differences.
I haven't tried this yet and this might help get the correct sd or not?
🥳 🥳 💌
Amazing stuff!
Odd that padding by zero introduces bias but I guess that makes sense.
array[tlength] int log_cases;
Ah that would be on me - sorry!
This is looking much better now. I can rerun the whole pipeline thing as you are right it is a bit of a monster but will wait for the potential additional term.
On another note do you think we should be showing relative scoring here against another method so we can discuss how this stacks up to other approaches when different amounts of information is known? We could for example use the latent model on the same samples, score, and then show relative scores to the latent model (which will start higher than 1 and then hopefully get near to being comparable)?
will wait for the potential additional term.
Yup, on my todo list for today.
We could for example use the latent model on the same samples, score, and then show relative scores to the latent model
I like this idea. The only caveat would be that it could take forever to run the latent model on the full data set?
Ah turns out that we don't need the additional term because we're dealing with discrete incidence with daily censoring. Spent too much time thinking we needed to add something but I even wrote about not needing it in the paper... I think we're good to go
I think we really need a joint forward dynamic model... I was wondering if we should just use
epinowcast
as in the baseline case it should be the L2 from Shaun's paper (see #19). Ideally, we might want the growth rate version but that is still in dev (its randomly slow at the moment).I am a bit loathe to suggest as its a whole big thing to explain, its a messy package vs the nice slick things we have here, and ideally we'd nest in
brms
. It would also make exploring an additional censoring really quite hard - at least for he primary event.If others think it is the best option then we need a slightly different workflow as it doesn't have the same model structure as the other model implementations.