parksw3 / epidist-paper

Other
10 stars 4 forks source link

Diagnosing outbreak simulations #36

Closed parksw3 closed 1 year ago

parksw3 commented 1 year ago

outbreak

Some obvious issues:

We didn't have the first issue during our last run so maybe just a bad sample? We also had some problems with truncation before so need to fix them anyway.

My current guess is they're padding problems... I tried running my own fits but I don't have access to the exact samples that you used to fit so I was unable to replicate the first problem for the short delay. I was able to replicate the problem for the long delay, where zero padding is messing up the inference, even though I only have 5 out of 400 zeroes:

library(dplyr)
library(dynamicaltruncation)
library(data.table)
library(ggplot2); theme_set(theme_bw())
library(targets)

tar_load("sampled_simulated_observations_outbreak")

dd <- sampled_simulated_observations_outbreak %>%
  filter(scenario=="late outbreak") %>%
  filter(distribution=="long") %>%
  filter(sample_size==400)

dd_fit <- naive_delay(data=dd)

ee <- extract_lognormal_draws(dd_fit)

mean(dd$delay_daily) # 7.3
median(ee$mean) # 10.8
sd(dd$delay_daily) # 5.6
median(ee$sd) # 20.5

Again, getting rid of zeroes gets rid of the problem for the most part, although I'm still overestimating sd by quite a bit (but not as bad as before).


dd2 <- dd %>%
  filter(delay_daily != 0)

dd_fit2 <- naive_delay(data=dd2)

ee2 <- extract_lognormal_draws(dd_fit2)

mean(dd2$delay_daily) #7.4
median(ee2$mean) #7.6
sd(dd2$delay_daily) # 5.6
median(ee2$sd) # 6.9

Not even covering the empirical truncated SD:

quantile(ee2$sd, c(0.025, 0.975))

But actually, if we compare with log empirical mean and sd, our posteriors are good:

mean(log(dd2$delay_daily)) # 1.72
sd(log(dd2$delay_daily)) # 0.77

Compare this with:

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           1.73      0.04     1.65     1.80 1.00     3413     2655
sigma_Intercept    -0.25      0.04    -0.32    -0.18 1.00     3101     2440

where exp(-0.25) = 0.77. So if we're matching logmean and logsd so well, why is our SD estiamte so far off? Because of nonlinear effects... Jensen's inequality, essentially. If we take the empirical logmean and logsd and calculate the sd of the corresponding lognormal distribution, it matches up with waht we estimate.

mu <- mean(log(dd2$delay_daily))
sigma <- sd(log(dd2$delay_daily))

sqrt((exp(sigma^2)-1) * exp(2 * mu + sigma^2)) # 6.895

This brings up an interesting point about what "ground truth" we should compare to. Comparing to raw mean and sd could make some of our fits worse than they are? But I think it makes more sense to compare to raw mean and sd. Not suggesting we should be comparing to logmean and logsd. Maybe something we need to discuss in the Discussion section.

seabbs commented 1 year ago

My current guess is they're padding problems... I tried running my own fits but I don't have access to the exact samples that you used to fit so I was unable to replicate the first problem for the short delay.

I think you can get these from data/scenarios?

I was able to replicate the problem for the long delay, where zero padding is messing up the inference, even though I only have 5 out of 400 zeroes:

I'm super surprised it is such a dominant effect but I guess it makes sense.

This brings up an interesting point about what "ground truth" we should compare to. Comparing to raw mean and sd could make some of our fits worse than they are? But I think it makes more sense to compare to raw mean and sd. Not suggesting we should be comparing to logmean and logsd. Maybe something we need to discuss in the Discussion section.

Agree this is really really interesting. Also links to what people should be reporting when they estimate distributions (I would argue it should be the actual parameters of the distribution rather than mean sd etc but much more common to do the latter).

parksw3 commented 1 year ago

I think you can get these from data/scenarios?

I think this shows the whole simulation? Not the exact subsample you're looking at?

Also links to what people should be reporting when they estimate distributions (I would argue it should be the actual parameters of the distribution rather than mean sd etc but much more common to do the latter).

Good point. Lots to discuss in the paper.

seabbs commented 1 year ago

This is now looking more in line with expectations and the problems outlined above appear mostly resolved. The truncation-only model is still somewhat problematic at longer delays + early in the outbreak though (I think this is likely just to be expected?).

The truncation and censoring model is still slightly biased (in both the latent and non-latent case) for the standard deviation across all delays.

outbreak

seabbs commented 1 year ago

I think this is now resolved so closing.