reconhub / incidence

☣:chart_with_upwards_trend::chart_with_downwards_trend:☣ Compute and visualise incidence
https://reconhub.github.io/incidence
Other
58 stars 13 forks source link

r estimates from fit() are different when same data are estimated separately or as groups #93

Open jrcpulliam opened 5 years ago

jrcpulliam commented 5 years ago

Example:

library("incidence")  
set.seed(20181213)
days <- 1:14
# Group 1
dat_cases_1 <- round(20*rexp(-.3*(days)))
dat_dates_1 <- rep(as.Date(Sys.Date() + days), dat_cases_1)

i1 <- incidence(dat_dates_1)
f1 <- fit(i1)

# Group 2
dat_cases_2 <- round(rexp(.3*(days)))
dat_dates_2 <- rep(as.Date(Sys.Date() + days), dat_cases_2)

i2 <- incidence(dat_dates_2)
f2 <- suppressWarnings(fit(i2))

# Combine groups
grp <- rep(c("grp1", "grp2"), c(length(dat_dates_1), length(dat_dates_2)))
i.grp <- incidence(c(dat_dates_1, dat_dates_2), groups = grp)
f3 <- fit(i.grp)
#> Warning in fit(i.grp): 8 dates with incidence of 0 ignored for fitting

abs(f3$info$r['grp1']-f1$info$r) # should be 0 or very close to it
#>       grp1 
#> 0.09000228

Created on 2019-01-07 by the reprex package (v0.2.1)

This behavior derives from the fact that i2 has dates with 0 cases but i1 does not, and these dates are removed for both groups when estimating the growth rate.

zkamvar commented 5 years ago

I think we may be able to address this by converting zeros to NA before fitting the model. That way, we can rely on the model functionality to handle the missing data in the way it sees fit. The thing about using this approach is that it will affect the confidence interval estimates. I'll get around to an example when I can get to coding it (which is after I slog through a month of emails github issues 😩)