Add option to bias-correct GEE variance-covariance matrix for small samples

jr-leary7 / scLANE

Characterize gene dynamics over trajectories using GLMs, GEEs, & GLMMs.

MIT License

10 stars 2 forks source link

as per Li and Redden (2014), using the standard sandwich variance-covariance matrix to perform a Wald test of significance - as we do in waldtestGEE() - leads to inflated Type I error (false positive) rates when the number of subjects is small and / or the number of timepoints per-subject is large
this was seen in our simulation study, where we observed high numbers of false positives when using $n_s = 6$ subjects
scRNA-seq datasets quite often have low numbers of subjects compared to observations per-subject, making this a problem for the GEE mode
several small-sample bias corrections exist, the easiest of which to implement is the DF correction, where $p$ is the number of covariates:

$$ \mathbf{V}_{df} = \left(\frac{n_s}{n_s-p}\right) \mathbf{V} $$

thus, it makes sense to add an option to bias-correct the estimated sandwich variance-covariance matrix when the number of subjects is lower than some heuristic threshold e.g., $n_s = 30$

jr-leary7 / scLANE