Closed nicolasfoss closed 3 months ago
Thanks for the issue! Hope you don't mind that I edited your issue description to adjust code formatting.
I'll first say that stat = "z"
is a "theorized" test statistic supported by infer that can apply those distributional assumptions to plot a curve for you:
library(infer)
library(tidyverse)
set.seed(123)
faux_data <- tibble(dead = sample(
x = factor(c("yes", "no"), levels = c("yes", "no")),
size = 100, replace = T),
gender = sample(
x = factor(c("male", "female"), levels = c("male", "female")),
size = 100,
replace = T
))
faux_summary <- faux_data %>%
summarize(N = n(),
prop_dead = mean(dead == "yes"),
count_dead = sum(dead == "yes"),
.by = gender
)
faux_diff <- faux_data %>%
specify(dead ~ gender, success = "yes") %>%
calculate(stat = "z", order = c("male", "female")) %>%
pull()
faux_test <- faux_data %>%
prop_test(dead ~ gender, order = c("male", "female"))
faux_model <- faux_data %>%
specify(dead ~ gender, success = "yes") %>%
hypothesise(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "z", order = c("male", "female"))
faux_model %>% visualize()
faux_model %>% visualize(method = "both")
#> Warning: Check to make sure the conditions have been met for the theoretical method.
#> infer currently does not check these for you.
Created on 2024-08-19 with reprex v2.1.1
In general, I think we feel that histograms best represent the empirical distributions that can be created with infer pipelines, and likely won't extend that functionality in future releases of the package. That said, the outputs of visualize()
are just ggplot objects, and you're able to add elements to them as you please! With your example and a call to geom_density()
:
library(infer)
library(tidyverse)
set.seed(123)
faux_data <- tibble(dead = sample(
x = factor(c("yes", "no"), levels = c("yes", "no")),
size = 100, replace = T),
gender = sample(
x = factor(c("male", "female"), levels = c("male", "female")),
size = 100,
replace = T
))
faux_summary <- faux_data %>%
summarize(N = n(),
prop_dead = mean(dead == "yes"),
count_dead = sum(dead == "yes"),
.by = gender
)
faux_diff <- faux_data %>%
specify(dead ~ gender, success = "yes") %>%
calculate(stat = "diff in props", order = c("male", "female")) %>%
pull()
faux_test <- faux_data %>%
prop_test(dead ~ gender, order = c("male", "female"))
faux_model <- faux_data %>%
specify(dead ~ gender, success = "yes") %>%
hypothesise(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "diff in props", order = c("male", "female"))
faux_model %>%
visualize() +
geom_density(aes(x = stat))
Created on 2024-08-19 with reprex v2.1.1
@simonpcouch Thanks so much for the guidance.
It looks like I can get what I was looking for already! I had not seen stat = "z" and I will make sure to dig deeper next time.
Have a great week, it was a pleasure learning with you at posit::conf.
You're very welcome! You as well.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Intro
Hello! First, thank you for taking time to review this issue. This is my first time posting an issue on Github so I might be making some mistakes. Please let me know how to improve this so I can make the material more helpful or to save you time.
Issue
In cases where you may want to overlay a density curve over the histogram produced by visualize(method = "simulation"), you cannot run visualize(method = "both") or you will get an error. It might be a helpful option to see shading under a density curve instead of thr histogram shading. See below for a reprex and example of what I propose be available for the empirical distribution as well via method = "simulation". This would be quite helpful for cases when you are looking at categorical data, as the error you get (see below) mentions that.
I will admit...
I do realize there is documentation showing how to use visualize() with both the theoretical and empirical distributions, but my use case here did not apply to the documentation / vignette.
It would be great for visualize to produce the density curve for non-theoretical distributions like the one produced above here is a home grown example.
Thanks!
Thanks again for your time and I look forward to interacting with you.