microsoft / datamations

https://microsoft.github.io/datamations/
Other
66 stars 14 forks source link

Assign the group_by variable to color and use two summarized variables on the xy-axes? #100

Open joelostblom opened 2 years ago

joelostblom commented 2 years ago

I am trying to replicate a plot like this from altair/vega:

import altair as alt
from vega_datasets import data

cars = data.cars()
alt.Chart(cars).mark_point().encode(
    x='mean(Weight_in_lbs)',
    y='mean(Miles_per_Gallon)',
    color='Origin')

image

I thought something like this might do it, but it keeps the grouped variable on the x-axis instead of assigning it to the color, and also drops one of the summarized variables. Is it possible to specify which variable should go where?

library(datamations)
library(dplyr)

"cars %>% 
  group_by(Origin) %>%
  summarize(Weight_in_lbs = mean(Weight_in_lbs),
            Miles_per_Gallon = mean(Miles_per_Gallon))" %>%
  datamation_sanddance()

image

sharlagelfand commented 2 years ago

Hi @joelostblom! In general, it is possible to specify which variables goes where by passing the spec as ggplot2 code, as in this example from the README:

"small_salary %>%
  group_by(Work, Degree) %>%
  summarize(mean_salary = mean(Salary)) %>%
  ggplot(aes(x = Work, y = mean_salary)) + 
  geom_point() + 
  facet_grid(rows = vars(Degree))" %>%
  datamation_sanddance()

README-mean_salary_group_by_degree_work_ggplot

As to your question about dropping one of the summarized values - right now we only support a single summarized value. There is an issue open for this (#53) so is definitely on our minds as a limitation!