tidymodels / infer

An R package for tidyverse-friendly statistical inference
https://infer.tidymodels.org
Other
726 stars 80 forks source link

Residuals and expected cell counts for chi-square tests #462

Closed VectorPosse closed 1 year ago

VectorPosse commented 2 years ago

Feature

Any thoughts about providing residuals or expected cell counts for chi-square tests? I can appreciate that post hoc testing is likely outside the scope of the infer package, but as it stands now, I can't see any way to pipe infer output to any other tool that would calculate them. If I want to teach my students about post-hoc testing, I have to basically ignore everything I just got finished teaching them with infer tools and go back and run chisq.test from scratch.

simonpcouch commented 2 years ago

Thanks for the issue!

I'm going to write w.r.t. Chi-square tests of independence for the purposes of brevity, but the analogous machinery applies for goodness of fit tests.

I can't see any way to pipe infer output to any other tool that would calculate them.

The current state of infer’s support for calculating randomization-based expected cell counts and residuals would rely on generateing samples under a null hypothesis of independence with the usual infer machinery and then tableing away.

# use infer to generate a distribution of values under the null
library(infer)

reps <- 1e4

null_dist <- gss %>%
  specify(finrela ~ sex) %>%
  hypothesize(null = "independence") %>% 
  generate(reps = reps, type = "permute")

null_dist
#> Response: finrela (factor)
#> Explanatory: sex (factor)
#> Null Hypothesis: independence
#> # A tibble: 5,000,000 × 3
#> # Groups:   replicate [10,000]
#>    finrela           sex    replicate
#>    <fct>             <fct>      <int>
#>  1 average           male           1
#>  2 DK                female         1
#>  3 far below average male           1
#>  4 average           male           1
#>  5 average           male           1
#>  6 average           female         1
#>  7 average           female         1
#>  8 below average     female         1
#>  9 above average     female         1
#> 10 far below average female         1
#> # … with 4,999,990 more rows

# use `table` to create the observed
observed_counts <- table(gss$finrela, gss$sex)

observed_counts
#>                    
#>                     male female
#>   far below average    8     17
#>   below average       55     60
#>   average            127    112
#>   above average       65     41
#>   far above average    6      4
#>   DK                   2      3

# use the same function to form the null distribution
expected_counts <- table(null_dist$finrela, null_dist$sex) / reps

expected_counts
#>                    
#>                         male   female
#>   far below average  13.1783  11.8217
#>   below average      60.4669  54.5331
#>   average           125.7173 113.2827
#>   above average      55.7701  50.2299
#>   far above average   5.2532   4.7468
#>   DK                  2.6142   2.3858

# take residuals
residuals <- observed_counts - expected_counts

residuals
#>                    
#>                        male  female
#>   far below average -5.1783  5.1783
#>   below average     -5.4669  5.4669
#>   average            1.2827 -1.2827
#>   above average      9.2299 -9.2299
#>   far above average  0.7468 -0.7468
#>   DK                -0.6142  0.6142

# calculate chi^2
sum(residuals^2 / expected_counts)
#> [1] 9.122616

This doesn't feel too verbose to me, especially if facility with residuals and expected cell counts were a pedagogical goal. If infer is a known tool, as well, then the code to generate null_dist is more straightforward than other randomization-based routes to the same output.

For juxaposition, the same values with chisq.test():

table(gss$finrela, gss$sex)
#>                    
#>                     male female
#>   far below average    8     17
#>   below average       55     60
#>   average            127    112
#>   above average       65     41
#>   far above average    6      4
#>   DK                   2      3
chisq.test(gss$finrela, gss$sex)$expected
#> Warning in chisq.test(gss$finrela, gss$sex): Chi-squared approximation may be
#> incorrect
#>                    gss$sex
#> gss$finrela            male  female
#>   far below average  13.150  11.850
#>   below average      60.490  54.510
#>   average           125.714 113.286
#>   above average      55.756  50.244
#>   far above average   5.260   4.740
#>   DK                  2.630   2.370
chisq.test(gss$finrela, gss$sex)$residuals
#> Warning in chisq.test(gss$finrela, gss$sex): Chi-squared approximation may be
#> incorrect
#>                    gss$sex
#> gss$finrela               male     female
#>   far below average -1.4201831  1.4960567
#>   below average     -0.7058795  0.7435912
#>   average            0.1146962 -0.1208239
#>   above average      1.2379814 -1.3041208
#>   far above average  0.3226553 -0.3398933
#>   DK                -0.3884746  0.4092290
chisq.test(gss$finrela, gss$sex)$statistic
#> Warning in chisq.test(gss$finrela, gss$sex): Chi-squared approximation may be
#> incorrect
#> X-squared 
#>  9.105397

Note that the $residuals slot of chisq.test() output contains Pearson residuals, so we could take residuals / sqrt(expected_counts) to match that output if we wanted to match that output.

For another approach, the book "Introduction to Modern Statistics" has a chapter with applications in infer, including a learnr tutorial working through a Chi-square test of independence with infer. That lesson leans heavily on motivating the test statistic visually before shifting into infer code, but doesn't surface the code that pushes around any tables. They write:

Computing these expected counts while respecting the marginal distributions is a bit tedious, so we’ll be relying upon [base] R for this calculation.

...which is perhaps a symptom of lack of infer support.

Created on 2022-11-17 with reprex v2.0.2

rhinopotamus commented 2 years ago

I was curious about this too, and I went a-digging. Line 419 or so of calculate.R seems to be the operative part, and it looks like it just calls stats::chisq.test, which under the hood does produce a table of expected counts. However, on line 428, we're just pulling "statistic" out of the output of stats::chisq.test and, it looks like, throwing the rest of the output away.

Maybe if there was a way to keep the rest of the output of stats::chisq.test around, then a function called get_expected_counts() or get_residuals() could exist to just report the $expected or $residuals slots.

(I am also going to have this same question about anova -- I'd love to be able to do post-hoc stuff with the infer pipeline.)

ooh -- simonpcouch wrote the comment above while I was writing this one -- but I think this comment is maybe still relevant.

VectorPosse commented 2 years ago

I'll also point out that there is tidyverse precedent for such a request. For example, tidymodels (parsnip, specifically) uses the extract_fit_engine() function to access the underlying object that can then be passed to other functions.

Another important use case would be anyone wanting to use broom to tidy results. AFAIK, broom does not know what to do with output from infer, so some of this is about also playing nice with other packages within the tidyverse.

Thanks @simonpcouch for your consideration. Your sample code gives me some options to think about, but I would encourage y'all to continue the discussion to see if being able to extract the base R model is something you're willing to support, even if @rhinopotamus's suggestion about getting specific residuals is too specific (i.e. not generalizable to all infer workflows).

simonpcouch commented 2 years ago

However, on line 428, we're just pulling "statistic" out of the output of stats::chisq.test and, it looks like, throwing the rest of the output away.

That's correct.🙂 There have been some issues/PRs over the years on this repo about whether to use the stats backend or not in this case, and the discussion on those issues might be illuminating. https://github.com/tidymodels/infer/issues/108, https://github.com/tidymodels/infer/issues/248, etc.

Maybe if there was a way to keep the rest of the output of stats::chisq.test around, then a function called get_expected_counts() or get_residuals() could exist to just report the $expected or $residuals slots.

This is certainly possible! For internal consistency, and w.r.t. scope creep, I'd be hesitant to consider this addition.

For example, tidymodels (parsnip, specifically) uses the extract_fit_engine() function to access the underlying object that can then be passed to other functions.

parsnip, as a principle, doesn't implement its own computational engines, and thus relies on engines to fit models. The analogy with chisq.test is quite loose here, as most other statistics in infer are implemented "in-house," and an engine isn't well-defined in those cases.

Another important use case would be anyone wanting to use broom to tidy results. AFAIK, broom does not know what to do with output from infer, so some of this is about also playing nice with other packages within the tidyverse.

infer outputs tidy data frames, by construction, so there's no work for broom to do here.


Will leave this issue open in case folks want to chime in. :)

VectorPosse commented 2 years ago

“ infer outputs tidy data frames, by construction, so there's no work for broom to do here.”

Yes and no. Broom isn’t just about making tibbles out of messy output. It’s also about formatting that tibble in a consistent way with all the metadata generated from the tests it supports.

On Thu, Nov 17, 2022 at 7:22 PM Simon P. Couch @.***> wrote:

However, on line 428, we're just pulling "statistic" out of the output of stats::chisq.test and, it looks like, throwing the rest of the output away.

That's correct.🙂 There have been some issues/PRs over the years on this repo about whether to use the stats backend or not in this case, and the discussion on those issues might be illuminating. #108 https://github.com/tidymodels/infer/issues/108, #248 https://github.com/tidymodels/infer/issues/248, etc.

Maybe if there was a way to keep the rest of the output of stats::chisq.test around, then a function called get_expected_counts() or get_residuals() could exist to just report the $expected or $residuals slots.

This is certainly possible! For internal consistency, and w.r.t. scope creep, I'd be hesitant to consider this addition.

For example, tidymodels (parsnip, specifically) uses the extract_fit_engine() function to access the underlying object that can then be passed to other functions.

parsnip, as a principle, doesn't implement its own computational engines, and thus relies on engines to fit models. The analogy with chisq.test is quite loose here, as most other statistics in infer are implemented "in-house," and an engine isn't well-defined in those cases.

Another important use case would be anyone wanting to use broom to tidy results. AFAIK, broom does not know what to do with output from infer, so some of this is about also playing nice with other packages within the tidyverse.

infer outputs tidy data frames, by construction, so there's no work for broom to do here.

Will leave this issue open in case folks want to chime in. :)

— Reply to this email directly, view it on GitHub https://github.com/tidymodels/infer/issues/462#issuecomment-1319463192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3M3H6L3UQNXECIRC2S6LLWI3R53ANCNFSM6AAAAAAR7MU5IM . You are receiving this because you authored the thread.Message ID: @.***>

rhinopotamus commented 2 years ago

@simonpcouch I appreciate the concern about scope creep!

Here's a potential argument for including interfaces for extracting residuals etc., which you should please feel free to adopt or ignore: vignette("infer") is a really lovely articulation of the philosophical approach this package takes. The vignette includes this comment: "infer also offers several utilities to extract the meaning out of summary statistics and distributions". I think residuals could neatly fall into this category of work: imo, with a chi-square test for independence, the meaning of the test is in the residuals.

So, as an example, I'm going to follow along with what's in vignette("chi_squared"): is there some kind of association between people's financial status and whether they have a college education? gss %>% specify(college ~ finrela) %>% hypothesize(null = "independence") %>% calculate(stat="chisq") reports that the chi-squared statistic is 30.7, which is large. Accordingly, the pipeline gss %>% specify(college ~ finrela) %>% assume(distribution = "chisq") %>% get_p_value(obs_stat = 30.7, direction = "right") produces a p-value of 0.0000107. So we're going to reject the null hypothesis, and conclude that there is an association between financial status and college education.

However, importantly, we don't know what that association is! It's reasonable to hypothesize that people with a college degree feel more financially secure, but unless we have some way to quantify the deviations from expected, we'd only be guessing.

(This example feels a little obvious, but if we get into something more subtle like the smoking dataset from the openintro library, we can use the chi-squared test to see that there is an association between marital status and smoking status, but infer pipelines give us no way to see who actually smokes more.)

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.