Open seabbs opened 2 years ago
Another example of the use of πΏ2 is given by Cox and Medley, 11 who estimate the distribution of the time π taken for an AIDS diagnosis to be reported to the Communicable Disease Surveillance Centre. They allow the rate of AIDS diagnoses to be increasing sub-exponentially, by using β(π‘;π)=π0exp(π1π‘+π2π‘2) , and test the null hypothesis that π2=0 . They consider several parametric models for the distribution of the reporting delay π .
This is a really nice early nowcasting paper that implements a model very similar to that in epinowcast
πΏ3 does not require a model ππ(π₯;π) to be specified for the distribution of the initial event times. This eliminates the risk that such a model may be misspecified. However, it has the disadvantage that some of the information in the data is being discarded, which makes πΏ3 less efficient than πΏ1 , especially when π is small.
epinowcast
model to this comparison. I would rather not but it should be the L2 model. flexsurv
. It isn't clear to me how this handles censoring when combined with truncation and I think perhaps it is being ignored.In addition to being right-truncated, π may be censored. This is easily handled in parametric models by replacing πβπ(π‘π) in πΏ1 and πΏ3 by πΉβπ(π‘ππ)βπΉβπ(π‘πΏπ) , where [π‘πΏπ,π‘ππ] is the interval within which individual π βs delay is known to lie.
Censoring comment in the discussion. Doesn't address what to do about truncation.
As a specific example there is also Estimates of the severity of coronavirus disease 2019: a model-based analysis which uses growth rate adjustment of naive delays (similar to what is used in Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data).
Perhaps it's even worth adding the growth rate adjustment described in Section 2.1 of the supplement (with code) to the scenarios investigated?
@parksw3 had originally had dynamic adjustment much more front and centre in this work. We have pushed it back a bit as the simulations have got more complex as its a bit hard to implement when growth rates are varying and 2 if they are varying how do you estimate them without a joint model (i.e meaning the post-processing approach is perhaps not ideal). The easiest thing to do is treat them as known but then does that really help people who want to implement these methods.
I think the plan is definitely to keep it in (at least in some form). I am currently investigating a simplified form of the forward correction (i.e having the growth rate jointly estimated in model) which should be a bit easier to compare to other approaches (maybe).
- Explores the impact of censoring with windows wider than 1 day per event but doesn't make an adjustment. If I remember correctly I think I assumed exponential growth across the interval censored period and then added the exponential to the likelihood and integrated over this. Similar to other approaches, but using "L3" rather than "L2" using Shaun's definitions, which I think is the one I've seen more commonly applied. However, I would now prefer a latent variable approach, as this is not vectorisable and had terrible computational scaling (which is also why I prefer L3 as L2 isn't vectorisable when accounting for interval censoring and right truncation even when using the latest variable approach). But this approach still requires knowing the growth rate in advance, and for long intervals the results are very sensitive to the growth rates, for the Wuhan data it could push the modelled mean from 4-7 days by moving the growth rate over a small range.
Ah you are right. Isn't this L1 according to Shaun's framework where the growth rate is known? L3 is explicitly not joint modelling. Do you have the likelihood for the complete model you used here written down somewhere? Or for that matter the code? Just had a brief browse and can only see code for the other bits of the paper.
I totally agree that needing to know the growth rate/assume it is fixed is a limitation I am not really willing to accept.
I believe this is L3, which I think of as "forward looking", but the essential meaning of "conditional on initial" is the same. This involves conditioning on time of the first event and looking at the distribution of secondary event times. If the first event time is known exactly, all the joint modelling parts cancel out, which is why you don't have any joint modelling. However, if you have interval censoring on the first event time, they no longer cancel out since the g(i) terms fall inside the integrals. Which is why we use the latent variable approach in the Bayesian model, so that the event times are sampled, and then the integrals over "i" disappear and the g(i) terms cancel out, so no joint modelling is required.
I think as written this screenshot is a touch unclear (but this really doesn't matter) and more in line with the joint approach (i.e L1). I agree if you condition on primary events (and therefore don't model their uncertainty etc) you can call g terms and rewrite the likelihood as done in L3.
I agree it can be dropped without censoring or when censoring is otherwise handled. Though as we discussed for longer censorings windows that is no longer trivial.
Suggestion from @sbfnk to look at Ebola Virus Disease in West Africa β The First 9 Months of the Epidemic and Forward Projections (supplement).
They do a lot of distribution estimation aiming to correct for left truncation (they call this censoring but I think it sounds like it isn't (as they apply the correction to all data and not just the censored observations) and daily censoring. They do a daily censoring adjustment by just shifting all the data by half a day. This seems like it should add some bias but be better than doing nothing. I don't want to add more work but perhaps we do need to investigate this as commonly used? I am not totally clear why they have left truncation and it seems like right truncation would be a much much bigger deal in their data given the state of the outbreak when this was published. Perhaps this is a mistake in the equations?
I guess this approach makes sense if filtering out recent observations based on delay length but as written this would apply to all short delays (including those far in the past) which seems incorrect.
You may want to take a look at their section discussing generation time estimation @parksw3 for other work if you haven't already...
I see nothing in the papers citing this that indicates any mistakes have been flagged but lots and lots of reuse of these distribution estimates for quite "high impact" work so if we do agree there are issues it's a good thing to discuss heavily.
Example estimating the incubation period of Monkeypox with some mention of censoring but none of truncation: https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2022.27.24.2200448
Cites this COVID paper for its method details: https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.5.2000062#html_fulltext
Method details are not in the supplement (its just the main text so very sparse). They do a censoring correction for unknown exposure times but no daily censoring adjustment and no truncation adjustment (see stan code below). They published data and so in theory this is something we could look at as a real-world case study if we so wished (not sure we need to or should)
data{
int <lower = 1> N;
vector[N] tStartExposure;
vector[N] tEndExposure;
vector[N] tSymptomOnset;
}
parameters{
real<lower = 0> alphaInc; // Shape parameter of weibull distributed incubation period
real<lower = 0> sigmaInc; // Scale parameter of weibull distributed incubation period
vector<lower = 0, upper = 1>[N] uE; // Uniform value for sampling between start and end exposure
}
transformed parameters{
vector[N] tE; // infection moment
tE = tStartExposure + uE .* (tEndExposure - tStartExposure);
}
model{
// Contribution to likelihood of incubation period
target += weibull_lpdf(tSymptomOnset - tE | alphaInc, sigmaInc);
}
generated quantities {
// likelihood for calculation of looIC
vector[N] log_lik;
for (i in 1:N) {
log_lik[i] = weibull_lpdf(tSymptomOnset[i] - tE[i] | alphaInc, sigmaInc);
}
}
Lauer paper which we made a lot of use of early on (and late on for that matter) as the principle incubation period estimate: https://www.acpjournals.org/doi/10.7326/M20-0504
They used: "using a previously described parametric accelerated failure time model (13)" which reminds me we do need to make the point clearly that this estimation task is best thought about as a time to event (i.e survival problem) and therefore use methods (like we do) from that silo.
The actual implementation they used was: coarseDataTools
and activemonitr
- first one is some kind of fairly reasonable censoring (but not truncation) adjusted method and no idea about the second one. I wouldn't have described that method as a parametric accelerated failure time model but perhaps it is or perhaps they used something else for the actual estimation?
Code: https://github.com/HopkinsIDD/ncov_incubation
Yup they just use courseDataTools
so no truncation adjustment but they are accounting for double censoring in a way that I think is sensible (at least will need to dig more into Reich et al. to work out if it isn't)
In this (https://onlinelibrary.wiley.com/doi/full/10.1111/j.1541-0420.2011.01709.x?saml_referrer) work by Reich et al. they deal with the truncation issue using an EM maximisation approach (that seems fine) for CFR estimation. They don't do anything about the delay they are actually using being truncated and it appears in general there is no functionality in coarseDataTools
to do this.
We haven't really discused where this paper fits which we maybe should: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0257978.
Growth rate correction being used in the wild: https://www.mdpi.com/2077-0383/9/2/538
We haven't really discused where this paper fits which we maybe should: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0257978.
The addition on the incubation period is nice. We've currently added this to our forward looking (L3) approach instead of the joint approach (L2 I think?). The joint approach was horribly slow since evaluating the integral on the denominator is a pain in stan. An issue with the incubation period approach is that is isn't quite correct, as there is an uncorrected epidemic phase bias in there. I think it is possible to correct this, but you need to add a backcalculation of infection incidence, which we've not tried to implement. If the delay of interest is much longer than the incubation period (e.g. if looking at time to death) then the missed epidemic phase bias is hopefully negligible. But for e.g. time from onset-to-testing, the incubation period is likely to be longer, so the magnitude of the missed epidemic phase bias is probably larger than the epidemic phase bias we're putting lots of effort in to correct for the onset-to-testing delay.
I looked up fistdistr
and that supports censored fitting but nothing else. It also provides no guard rails so literally you just specify left and right censoring per data point. I've seen a lot of mistakes being made with this for daily data in the wild.
n issue with the incubation period approach is that is isn't quite correct, as there is an uncorrected epidemic phase bias in there. I think it is possible to correct this, but you need to add a backcalculation of infection incidence, which we've not tried to implement. If the delay of interest is much longer than the incubation period (e.g. if looking at time to death) then the missed epidemic phase bias is hopefully negligible.
Sounds like you were thinking along the same lines as @parksw3 and I! Looking forward to seeing your work on this.
Also I guess this ends up being similar to using a latent delay in an epinowcast
style approach (Poisson version of L1?)? That would be interesting to explore. I suppose the big advantage is much easier support for non daily censoring windows? Edit: Is this true or am I dreaming?
via @sbfnk: "Estimating the serial intervals of SARS-CoV-2 Omicron BA.4, BA.5, and BA.2.12.1 variants in Hong Kong" (https://onlinelibrary.wiley.com/doi/pdf/10.1111/irv.13105)
It uses the fixed growth rate truncation adjustment approach but with sensitivity analysis on the growth rate (I think a method that uses a prior here would help people if we feel like supplying it). It also appears to additionally do right truncation adjustment on top of this so is a nice example of this issue for the introduction
"Estimating the serial intervals of SARS-CoV-2 Omicron BA.4, BA.5, and BA.2.12.1 variants in Hong Kong"
I saw this paper too earlier and thought I already added the paper, but turns out I didn't... oops... this paper also made me wonder whether we need to show somewhere in the SI that accounting for both truncation and growth rate approach is bad.
yeah I agree but perhaps we can hold off on that whilst we knock everything else into shape.
good point. Also agree with that.
Suggested by Shaun Seamen this paper may be useful for discussing approaches for different censoring assumptions: https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.2697
A very new study to discuss: https://www.thelancet.com/journals/lanmic/article/PIIS2666-5247(23)00005-8/fulltext
Seems to be approaching things from a fairly odd angle but has all the same issues from what I can see
Applications
Serial interval and incubation period estimation for Monkeypox (also used as part of CDC reporting). Censoring adjustment but not right truncation adjusted. Based on
EpiEstim
which itself usescourseDataTools
(https://github.com/nickreich/coarseDataTools). https://www.medrxiv.org/content/10.1101/2022.10.26.22281516v1.full.pdfcourseDataTools
has a range of linked citations with I think the methods coming from here: https://doi.org/10.1002/sim.3659. From reading it makes use of two doubly censored approaches one of which is a reduction of the other. Its frequentist and I think corresponds to our simply censoring approach of the latent approach without truncation (assuming uniform priors). They simulate using a uniform prior for the day of the primary event, a log normal for the distribution and then censor the secondary event (so no phase bias issues in their simulation). They explored diurnal (waking day) biased priors and found minimal impact. They also investigated a spiked prior and found this had more impact.Something worth adding to our discussion is this can be done trivially for our approach either via
brms
or using stan directly which is nice.This could be a useful way to frame our exploration of sample size.
courseDataTools
usessurvival
but its own code for the doubly censored model (which assumes uniform censoring).courseDataTools
received a lot of recent usage for reference.Theory
Left truncation + censorings vs naive methods: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7770078/
Understanding an evolving pandemic: An analysis of the clinical time delay distributions of COVID-19 in the United Kingdom: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0257978.
Estimating a time-to-event distribution from right-truncated data in an epidemic: A review of methods: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9465556/