Closed parksw3 closed 1 year ago
Across all 20 replicates, we are consistently underestimating ptime
and overestimating stime
with consistent banding, as you noticed.
This means that we are estimating longer delays on average, which in turn translate to a higher meanlog estimate. There is no such pattern in real data:
One obvious (or not so obvious) source of bias is the prior. The default prior for the intercept has a mean at 6 which is pretty far from the true value. So I set the prior mean to the true value (but with wide SD):
prior=brms::prior(normal(1.8, 10), class="Intercept") + brms::prior(normal(-0.7, 10), class="Intercept", dpar="sigma")
And the bias goes away for meanlog but starts popping up in sdlog:
> scores
parameter mad bias dss crps log_score ae_median se_mean
<fctr> <num> <num> <num> <num> <num> <num> <num>
1: meanlog 0.037 0.043 -5.5 0.022 -1.80 0.032 0.00150
2: sdlog 0.027 0.260 -6.3 0.015 -2.20 0.023 0.00071
3: mean 0.270 0.120 -1.5 0.160 0.16 0.230 0.08800
4: sd 0.310 0.260 -1.4 0.180 0.20 0.260 0.11000
This thing still exists too:
But I'm also confused because sdlog estimates look pretty damn good.
mean(summ[parameter=="sdlog"]$mean)
gives 0.5128 which is not at all far from 0.5... are we too worried about the scoring? For example, this is how I would calculate bias in the traditional sense:
relative_bias <- relative_draws |>
DT(parameter=="sdlog") |>
DT(, .(bias = mean(rel_value)), by="model")
mean(relative_bias$bias)
And we get 0.025, which I would say is pretty small. On the other hand, scoring tells us that the bias is 0.26, which seems much larger. I guess it depends on how much we're worried about the average vs distribution....maybe report both?
Actually, we might not want to try too hard to get rid of this bias. See https://github.com/parksw3/dynamicaltruncation/issues/16
If discrete delays are giving higher sdlog than continuous delays by about 2%, then we're kind of getting back what we expecte (adding a uniform random variable to discrete delays shouldn't magically help us get rid of this intrinsic 2% bias, if that makes sense?). So this might be the best we can do for now. And I feel we have a decent understanding on why we're getting biased results (although we still don't understand the cause of the strong negative correlation).
This has been implemented into the paper as a figure so closing.
Trying to tease apart https://github.com/parksw3/dynamicaltruncation/issues/21 a little better. First, simulate uniform data without truncation. In this case, both ptime-ptime_daily and stime-stime_daily are clearly uniformly distributed (see QQ plot below):
Now, we get 20 data sets and fit both the censoring model and the latent censoring model. No truncation. Nearly identical posterior distributions, except that the plain censoring model always estimates a lower sdlog than the censored model.
Some biases in both models:
It seems that the latent model does a pretty good job at estimating sdlog but overestimated meanlog. Why?