Closed parksw3 closed 1 year ago
20 sims of 200 samples with uniform cases. fitting latent variable 20 times.
90% coverage for meanlog, 100% coverage for sdlog. 95% coverage for mean. 100% coverage for sd. Not bad at all after all...
This is very cool. Essentially we are getting the same truncation effects globally within a day which makes sense. Played around with different growth rates and distributions. Becomes more/less of an issue as you would expect (we should probably show this).
So this is the source of the negative correlation observed in the latent model? i.e #17
This feels like it can be a second section. So we lead with a comparison of methods and then bring this in for the "best" method and explore in some detail?
Was there any evidence of bias across multiple fits? Visually it looks perhaps like its over-estimating the mean.
Just had a quick look at bias here.
https://github.com/parksw3/dynamicaltruncation/blob/main/scripts/test_uniform.Rv
Looks like some issues are present despite coverage being quite good
parameter mad bias dss crps log_score ae_median se_mean
<fctr> <num> <num> <num> <num> <num> <num> <num>
1: meanlog 0.042 0.42 -4.6 0.033 -1.30 0.044 0.00340
2: sdlog 0.031 -0.36 -5.9 0.018 -2.00 0.026 0.00088
3: mean 0.350 0.28 -1.0 0.220 0.44 0.300 0.18000
4: sd 0.390 -0.11 -1.1 0.190 0.36 0.260 0.12000
What's the unit of bias here? Based on the figure, I can't figure out what 0.42 could mean for meanlog. I looked at https://cran.r-project.org/web/packages/scoringutils/scoringutils.pdf but am pretty confused... would it be better to express it in terms of something like relative bias rather than quantile bias?
Bias is as defined here
I reran this and am getting a consistent bias towards under-prediction (maybe this implies I have introduced yet another bug).
r$> scores
parameter mad bias dss crps log_score ae_median se_mean
<fctr> <num> <num> <num> <num> <num> <num> <num>
1: meanlog 0.041 -0.13 -5.40 0.023 -1.80 0.034 0.00160
2: sdlog 0.032 -0.11 -6.10 0.015 -2.10 0.019 0.00066
3: mean 0.330 -0.13 -1.20 0.180 0.33 0.260 0.09800
4: sd 0.390 -0.13 -0.76 0.190 0.46 0.240 0.11000
Very very weird. I tried it without truncation (when everything really should be uniform) and am getting a consistent overestimation. I wonder if this is because there isn't truncation but the truncation is still being estimated? Let me break this down a bit more.
FYI, this thing now runs super super fast so I feel like doing a full simulation shouldn't take long.
> scores
parameter mad bias dss crps log_score ae_median se_mean
<fctr> <num> <num> <num> <num> <num> <num> <num>
1: meanlog 0.036 0.370 -5.9 0.0180 -2.000 0.028 0.00100
2: sdlog 0.026 0.065 -6.9 0.0099 -2.500 0.014 0.00027
3: mean 0.270 0.370 -1.9 0.1400 -0.042 0.200 0.05900
4: sd 0.300 0.230 -1.9 0.1300 -0.048 0.170 0.05100
I wonder if this is because there isn't truncation but the truncation is still being estimated? Let me break this down a bit more.
This sounds quite plausible.
I think we are covering this by looking at short delays and the residual bias our estimates still have in this setting (i.e it falls in our current simulation grid? I do think we may want some kind of figure to explain the divergence from the uniform we would expect in different growth settings as a nice way of highlighting the issue. Flagging for discussion in the paper.
Sorry, I made too many issues. This is now discussed in https://github.com/parksw3/dynamicaltruncation/issues/27.
And yes, we both agreed that we wanted to add a figure to explain these features.
See
test_censor.R
Consider a case in which the incidence is growing at rate
r
between time0
andt
such thati(t) = i(0) exp(rt)
. Then, the distribution of event time between 0 and t is proportional to i(t). In this case, the mean event time is$$ \frac{\exp(rt) (rt - 1) + 1}{r (\exp(rt)-1)}. $$
One thing to note is that this depends on $t$. So if we're dealing with a daily scale, assuming $r=0.2$, we get 0.52, which is very similar to 0.5. So we're not very far from the uniform distribution.
For example, consider a scenario in which the primary event is increasing at rate r=0.2. Then, the corresponding secondary event from this cohort increases at the same rate initially and then decreases exponentially.
If we look at the difference between continuous and discrete event time for both ptime and stime, they seem to match up with what we predict during the growth phase.
And during the "decay" phase, we see that the differences are lower than 0.5 for stime.
Now, consider a case we've truncated the data so that both primary and secondary events are growing exponentially. This is also more realistic and closer to the data we're getting.
In this case, our predictions match up for the most part except for the ptime near the end. Since truncation is happening based on the secondary event, the incidence of primary event starts shrinking rapidly near the truncation time:
This translates to a low mean between the incidence of primary and secondy events. So if we tried to fit the uniform distribution, we might do OK for the most part except for the end.
Now, this is the problem that's going to affect the uniform case as well. Even if the incidence of primary event (e.g., infection) is stable (r=0), if we truncate based on the secondary event (e.g., symptom onset), primary event is going to decrease, leading to biases as we see here:
In this case, we're also seeing some left truncation happening in the secondary event time. So for example, if we took everyone who got infected and developed symptoms in November, this is the kind of pattern we'll see even if the growth rate for the entire year is stable. It might be tempting to try to filter based on ptime to keep the incidence of ptime stable, but then the incidence of stime will not be stable.
Bottom line is that no matter what cohort we pick, it's impossible for i_p(t) and i_s(t) to both have qualitatively the same incidence patterns, which can cause the distribution of the differences between the continuous and discrete event time to systematically differ from the uniform distribution (especially for the cohorts near the edge).
How much does this matter for the overall inference? Harder to say... depends a lot on the distribution itself too (matters more for shorter distributions).