Closed wmorgan485 closed 1 year ago
I believe the code referenced is:
library(infer)
obs_fit <- gss %>%
specify(hours ~ age + college) %>%
fit()
null_dist <- gss %>%
specify(hours ~ age + college) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
fit()
visualize(null_dist) +
shade_p_value(obs_stat = obs_fit, direction = "two-sided")
Created on 2023-10-31 with reprex v2.0.2
The null hypothesis "independence"
is documented as:
Indicates that the values of the specified response variable are independent of the associated values in explanatory.
The distribution of the intercept under the null thus need not be centered at 0, and is not hypothesized as such in that example. :)
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
The problem
As seen in the Full infer Pipeline Examples, when using the
fit()
command for hypothesis testing with multiple explanatory variables, the null hypothesis for theintercept
is apparently not the expected value of zero. As seen when "visualizing the observed fit alongside the null fits," the null distribution for theintercept
is not centered at zero. Furthermore, the returned p-value for theintercept
is greater than 0.05, although the point estimate for theintercept
is clearly different than zero (as can be confirmed with thelm()
function).Reproducible example
Please refer to the Full infer Pipeline Examples for a reproducible example.