strengejacke / sjstats

Effect size measures and significance tests
https://strengejacke.github.io/sjstats
189 stars 21 forks source link

account for weights in RSS for chisq_gof #116

Closed lioumens closed 4 months ago

lioumens commented 1 year ago

chisq_gof currently give z-values that are too low. I believe RSS in the formula for the Osius-Rojek test is too high, and should account for the weights in the weighted linear regression.

$$ z = \frac{X^2 - (J - p - 1)}{\sqrt{RSS + A}} $$

library(sjstats)

# example in documentation
data(efc)
efc$neg_c_7d <- ifelse(efc$neg_c_7 < median(efc$neg_c_7, na.rm = TRUE), 0, 1)
m <- glm(
  neg_c_7d ~ c161sex + barthtot + c172code,
  data = efc,
  family = binomial(link = "logit")
)

x <- m

# source code of chisq_gof
y_hat <- stats::fitted(x)
wt <- x$prior.weight
vJ <- wt * y_hat * (1 - y_hat)
cJ <- (1 - 2 * y_hat)/vJ
X2 <- sum(stats::resid(x, type = "pearson")^2)
form <- stats::as.formula(x$formula)
form[[2]] <- as.name("cJ")
dat <- stats::na.omit(x$model)
dat$cJ <- cJ
dat$vJ <- vJ
RSS <- sum(stats::resid(stats::lm(form, data = dat, weights = vJ))^2)

# results from lm
wls <- stats::lm(form, data = dat, weights = vJ)
anova(wls) # RSS = 138.73
#> Analysis of Variance Table
#> 
#> Response: cJ
#>            Df  Sum Sq Mean Sq  F value    Pr(>F)    
#> c161sex     1   64.40   64.40  376.491 < 2.2e-16 ***
#> barthtot    1 1137.16 1137.16 6647.895 < 2.2e-16 ***
#> c172code    1   10.95   10.95   64.021 4.247e-15 ***
#> Residuals 811  138.73    0.17                       
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RSS
#> [1] 1660.497

sum(vJ * stats::resid(stats::lm(form, data = dat, weights = vJ))^2) # proposed fix
#> [1] 138.7261

Created on 2023-03-08 by the reprex package (v2.0.1)

strengejacke commented 4 months ago

Thanks, looks good!