Estimating the intercept with fit()

wmorgan485 commented 1 year ago

The problem

As seen in the Full infer Pipeline Examples, when using the fit() command for hypothesis testing with multiple explanatory variables, the null hypothesis for the intercept is apparently not the expected value of zero. As seen when "visualizing the observed fit alongside the null fits," the null distribution for the intercept is not centered at zero. Furthermore, the returned p-value for the intercept is greater than 0.05, although the point estimate for the intercept is clearly different than zero (as can be confirmed with the lm() function).

Reproducible example

Please refer to the Full infer Pipeline Examples for a reproducible example.

simonpcouch commented 1 year ago

I believe the code referenced is:

library(infer)

obs_fit <- gss %>%
   specify(hours ~ age + college) %>%
   fit()

null_dist <- gss %>%
   specify(hours ~ age + college) %>%
   hypothesize(null = "independence") %>%
   generate(reps = 1000, type = "permute") %>%
   fit()

visualize(null_dist) +
   shade_p_value(obs_stat = obs_fit, direction = "two-sided")

^{Created on 2023-10-31 with reprex v2.0.2}

The null hypothesis "independence" is documented as:

Indicates that the values of the specified response variable are independent of the associated values in explanatory.

The distribution of the intercept under the null thus need not be centered at 0, and is not hypothesized as such in that example. :)

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

tidymodels / infer

Estimating the intercept with fit() #509

The problem

Reproducible example