statdivlab / rigr

Regression, Inference, and General Data Analysis Tools for R
Other
10 stars 3 forks source link

Issue with chi-squared test in U() #156

Closed cwolock closed 3 months ago

cwolock commented 3 months ago

I may be mistaken, but I think there's a problem with the chi-squared test when using U(). In line 1099 of regress_utils.R, the F-statistic is calculated as

Fstat <- t(contrasts %*% coefficients - hypothesis) %*% solve(covmtx) %*% (contrasts %*% coefficients - hypothesis) / r

where r is the number of hypotheses being tested. When useFdstn is FALSE, this test statistic is used to compute a p-value based on the chi-squared distribution with r degrees of freedom. However, I think that the test statistic need to be multiplied by r for that to be a valid test (asymptotically).

As an example of the very different p-values that can obtained even in fairly large samples, run

mod1 <- regress("mean", atrophy ~ age + U(smoke = ~packyrs + yrsquit), data = mri, useFdstn = TRUE) mod2 <- regress("mean", atrophy ~ age + U(smoke = ~packyrs + yrsquit), data = mri, useFdstn = FALSE)

and compare the p-values for the smoke variable.

This p-value calculation does have a unit test, but I think the "by-hand" calculation (which is compared to the rigr output) is incorrect as well.

Apologies if I've made a mistake here!

adw96 commented 3 months ago

Great catch, @cwolock, and thanks for letting us know ! We'll look into it and post updates here.

gthopkins commented 3 months ago

Thanks for spotting this error @cwolock! We just fixed the issue in pull request #157.