chisq_gof currently give z-values that are too low. I believe RSS in the formula for the Osius-Rojek test is too high, and should account for the weights in the weighted linear regression.
$$
z = \frac{X^2 - (J - p - 1)}{\sqrt{RSS + A}}
$$
library(sjstats)
# example in documentation
data(efc)
efc$neg_c_7d <- ifelse(efc$neg_c_7 < median(efc$neg_c_7, na.rm = TRUE), 0, 1)
m <- glm(
neg_c_7d ~ c161sex + barthtot + c172code,
data = efc,
family = binomial(link = "logit")
)
x <- m
# source code of chisq_gof
y_hat <- stats::fitted(x)
wt <- x$prior.weight
vJ <- wt * y_hat * (1 - y_hat)
cJ <- (1 - 2 * y_hat)/vJ
X2 <- sum(stats::resid(x, type = "pearson")^2)
form <- stats::as.formula(x$formula)
form[[2]] <- as.name("cJ")
dat <- stats::na.omit(x$model)
dat$cJ <- cJ
dat$vJ <- vJ
RSS <- sum(stats::resid(stats::lm(form, data = dat, weights = vJ))^2)
# results from lm
wls <- stats::lm(form, data = dat, weights = vJ)
anova(wls) # RSS = 138.73
#> Analysis of Variance Table
#>
#> Response: cJ
#> Df Sum Sq Mean Sq F value Pr(>F)
#> c161sex 1 64.40 64.40 376.491 < 2.2e-16 ***
#> barthtot 1 1137.16 1137.16 6647.895 < 2.2e-16 ***
#> c172code 1 10.95 10.95 64.021 4.247e-15 ***
#> Residuals 811 138.73 0.17
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RSS
#> [1] 1660.497
sum(vJ * stats::resid(stats::lm(form, data = dat, weights = vJ))^2) # proposed fix
#> [1] 138.7261
chisq_gof
currently give z-values that are too low. I believe RSS in the formula for the Osius-Rojek test is too high, and should account for the weights in the weighted linear regression.$$ z = \frac{X^2 - (J - p - 1)}{\sqrt{RSS + A}} $$
Created on 2023-03-08 by the reprex package (v2.0.1)