Closed jhofman closed 3 years ago
Just rendered versions to refer to:
library(ggplot2)
library(dplyr)
library(datamations)
small_salary_data %>%
group_by(Degree, Work) %>%
summarize(mean_salary = mean(Salary, na.rm = TRUE)) %>%
ggplot(aes(x = Degree, y = mean_salary)) +
geom_point() +
facet_wrap(~ Work)
small_salary_data %>%
group_by(Degree, Work) %>%
summarize(mean_salary = mean(Salary, na.rm = TRUE)) %>%
ggplot(aes(x = Degree, y = mean_salary)) +
geom_point() +
facet_grid(Degree ~ Work)
small_salary_data %>%
group_by(Degree, Work) %>%
summarize(mean_salary = mean(Salary, na.rm = TRUE)) %>%
ggplot(aes(x = Degree, y = mean_salary, color = Work)) +
geom_point(position = position_dodge(width=0.25))
Started looking into this, some notes:
library(datamations)
library(dplyr)
library(ggplot2)
library(rlang)
pipeline <- "small_salary %>%
group_by(Degree, Work) %>%
summarize(mean_salary = mean(Salary, na.rm = TRUE)) %>%
ggplot(aes(x = Degree, y = mean_salary)) +
geom_point() +
facet_wrap(~ Work)"
Parse the pipeline - it splits by %>%, so all of the plotting is in the last element
pipeline_steps <- pipeline %>%
parse_pipeline()
pipeline_steps
#> [[1]]
#> small_salary
#>
#> [[2]]
#> group_by(Degree, Work)
#>
#> [[3]]
#> summarize(mean_salary = mean(Salary, na.rm = TRUE))
#>
#> [[4]]
#> ggplot(aes(x = Degree, y = mean_salary)) + geom_point() + facet_wrap(~Work)
Evaluate each of the data states (except the last, which contains plotting) - need this for each of the stages animated
data_states <- pipeline_steps[1:3] %>%
datamations:::snake(envir = global_env())
Now actually evaluate the entire pipeline (with the plotting) to get the plot object
p <- pipeline %>%
parse_expr() %>%
eval()
p
And we can get aspects of plotting from the plot object itself (rather than from the code used to generate it, since it can be written in so many different ways)
# Extracting x and y variables
p$mapping$x %>%
rlang::quo_name()
#> [1] "Degree"
# Extracting facets
# Some combination of this:
p$facet$params$facets %>%
names()
#> [1] "Work"
# Doesn't actually say whether this is a row or column facet, but we can combine information
bp <- ggplot_build(p)
bp$layout$layout
#> PANEL ROW COL Work SCALE_X SCALE_Y
#> 1 1 1 1 Academia 1 1
#> 2 2 1 2 Industry 1 1
# This shows that there's only 1 row (and 2 cols) so could figure out faceting info that way
The code in the ggreverse
package will probably be super helpful for figuring these bits out.
Then we can create e.g. a list with the x variable, y variable, facets, colours, etc, and use those to create the specs, rather than basing it on the # of groups. If they don’t supply ggplot2 code, can use the defaults that exist now of col facet -> row facet -> colors
this is awesome @sharlagelfand!
to make next steps concrete, let's try to use this to add a ggplot command to get a version of the degree+work plot used in the paper, with workplace as facet and degree as the x variable
Looks like this example works pretty well right out of the box!
One thing that's off is the X axis labels - @giorgi-ghviniashvili, from the specs it looks like all of the X values, X breaks, and X labels are 1,2 - do you know why it's not lining up?
@sharlagelfand Plot Salary within each group does not have scale.domain
.
Please add it and you will get this:
Ah thanks @giorgi-ghviniashvili!
Quite happy to say I have these examples working!!
library(dplyr)
library(ggplot2)
library(datamations)
"small_salary %>%
group_by(Degree, Work) %>%
summarize(mean_salary = mean(Salary, na.rm = TRUE)) %>%
ggplot(aes(x = Degree, y = mean_salary)) +
geom_point() +
facet_grid(~ Work)" %>%
datamation_sanddance()
"small_salary %>%
group_by(Degree, Work) %>%
summarize(mean_salary = mean(Salary, na.rm = TRUE)) %>%
ggplot(aes(x = Degree, y = mean_salary)) +
geom_point() +
facet_grid(Degree ~ Work)" %>%
datamation_sanddance()
This one looks a bit odd, we may want to fine tune the color & location of the infogrid a bit but it does work!
We have a great start here. @sharlagelfand will create a new issue to make sure we explain the limitations of this functionality and/or pop up corresponding warnings or error messages.
Right now we're sort of implicitly assuming that grouping variables become faceting variables, which is reasonable and will generalize. But what if someone wants control over this,? More generally, we want to "respect" the final plot that they generate and have the steps leading up to that reflect this.
To illustrate, imagine the same data analysis pipeline, but with three different plotting commands at the end. Right now we'd show the same datamation for each, but in theory they should end in different frames (and so should also contain different frames leading up to that).
Degree on the x, Work as facets
vs
Degree on the x, Work and Degree as facets
vs
Degree on the x, Work as (dodged) color, no facet
This will require a bunch of thinking and probably some hacking of ggproto objects, but let's do the thinking before the hacking.