larmarange / ggstats

Extension to ggplot2 for plotting stats
https://larmarange.github.io/ggstats/
GNU General Public License v3.0
27 stars 1 forks source link

Adding phi coefficients to stat_cross() function #6

Closed larmarange closed 1 year ago

larmarange commented 1 year ago

Copied from https://github.com/ggobi/ggally/issues/437

The stat_cross() function is very useful ! Would you consider adding phi measures of local associations, on top of Pearson's residuals ? Phi coefficients have the advantage of being bounded between -1 and 1, just as Pearson's correlation, and so their value is easily interpretable. Practically, I think it would imply very few changes in the code. This worked for me :

                       # compute cross statistics
                       panel <- broom::augment(chisq.test(xtabs(weight ~ y + x, data = data)))
                       panel$.phi <- with(data, GDAtools::phi.table(y, x, weight)) %>% as.data.frame() %>% dplyr::pull(Freq)

                       panel_names <- names(panel)
                       for (to_name in c(
                         "observed",
                         "prop",
                         "row.prop",
                         "col.prop",
                         "expected",
                         "resid",
                         "std.resid",
                         "phi"
                       )) {
larmarange commented 1 year ago

Hi @nicolas-robette

truly sorry, I forgot to consider your issue from last April.

Would you have a valid reference for the Phi measure? The reference in GDAtools documentation is no longer valid.

Rakotomalala R., 'Comprendre la taille d'effet (effect size)', http://eric.univ-lyon2.fr/~ricco/cours/slides/effect_size.pdf

Do you have a proper equation, to see if we could easily compute it directly in the package, to avoid multiple dependencies (if I import GDAtools, then it will create dependencies with the imports of GDAtools, etc.)

larmarange commented 1 year ago

OK I may have found a solution with a new function .augment_and_add_phi(). Please see #7

d <- as.data.frame(Titanic)
d$male <- factor(d$Sex == "Male") |>  forcats::fct_relevel("TRUE")
d$first <- factor(d$Class == "1st")   |>  forcats::fct_relevel("TRUE")

tab <- xtabs(Freq ~ Sex + Class, data = d)

tab |> 
  chisq.test() |> 
  ggstats::.augment_and_add_phi() |> 
  dplyr::select(Sex, Class, .phi)
#> # A tibble: 8 × 3
#>   Sex    Class   .phi
#>   <fct>  <fct>  <dbl>
#> 1 Male   1st   -0.236
#> 2 Female 1st    0.236
#> 3 Male   2nd   -0.149
#> 4 Female 2nd    0.149
#> 5 Male   3rd   -0.107
#> 6 Female 3rd    0.107
#> 7 Male   Crew   0.375
#> 8 Female Crew  -0.375

GDAtools::phi.table(d$Sex, d$Class, weights = d$Freq, digits = 3)
#>           1st    2nd    3rd   Crew
#> Male   -0.236 -0.149 -0.107  0.375
#> Female  0.236  0.149  0.107 -0.375

xtabs(Freq ~ male + first, data = d) |> 
  psych::phi(digits = 3)
#> [1] -0.236

Created on 2022-10-08 with reprex v2.0.2