Open sharlagelfand opened 3 years ago
at least three cases to think about:
summarize(mean_weight = mean(weight), mean_mpg = mean(mpg))
summarize(mean = mean(Salary), median = median(Salary))
summarize(mu = mean(salary), se = sd(salary) / sqrt(n))
let's think about only two outputs from summarize for now because more than two makes our heads hurt.
let's think about 1. @sharlagelfand will make some static snapshots of how this could look, with an initial scatter plot that shows all the points that then collapse to summarized points.
perhaps the way to deal with more than two is to do facets for each outcome?
Just switching back to penguins here, but here's how 1. could look in terms of the scatter and summary frames:
From this code:
library(dplyr)
library(ggplot2)
library(palmerpenguins)
theme_set(theme_minimal())
penguins %>%
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = island)) +
geom_point() +
coord_cartesian(xlim = c(30, 60), ylim = c(12, 23))
penguins %>%
group_by(island) %>%
summarise(
mean_bill_length = mean(bill_length_mm, na.rm = TRUE),
mean_bill_depth = mean(bill_depth_mm, na.rm = TRUE)
) %>%
ggplot(aes(x = mean_bill_length, y = mean_bill_depth, color = island)) +
geom_point() +
coord_cartesian(xlim = c(30, 60), ylim = c(12, 23))
For the infogrid, in this case we'd normally show the island on the X-axis, but we'd have to make the call here (likely from the fact that island appears only in color
, not in x
, in the ggplot2 code) to only represent it in color, like so:
great point about not using the x axis for island in a setting like this.
i think this approach makes sense, it would be great to prototype it in gemini @giorgi-ghviniashvili
Right now we only support one summarized value, e.g.
small_salary %>% group_by(Degree) %>% summarize(mean = mean(Salary))
Maybe in the future could think about how multiple operations (or summarizing multiple variables) could work, e.g.
small_salary %>% group_by(Degree) %>% summarize(mean = mean(Salary), median = median(Salary))