microsoft / datamations

https://microsoft.github.io/datamations/
Other
67 stars 14 forks source link

Pipeline cases that don't work #17

Closed sharlagelfand closed 3 years ago

sharlagelfand commented 3 years ago

Just keeping track of some examples of pipelines that don't work, for fixing/testing with later:

more than 1 value summarised

small_salary %>% group_by(Degree) %>% summarize(mean = mean(Salary), median = median(Salary))

(the second one is just ignored, gif includes mean only)

sharlagelfand commented 3 years ago

These cases are solved in the refactor-test branch (original comment updated only to include cases that still don't work). Some of these animations are a bit hectic, but focus is on handling the logic properly and frames for each stage, rather than on the tweening itself

summarise without grouping

library(datamations)
library(dplyr)

datamation_sanddance("small_salary %>% summarize(mean = mean(Salary))")

summarize_without_group

grouping variable contains _

library(dplyr)
library(datamations)

small_salary_eg <- small_salary %>%
  mutate(highest_degree = Degree)

datamation_sanddance(
  pipeline = "small_salary_eg %>% group_by(highest_degree)",
)

underscore_in_variable

grouping variable value contains _

library(dplyr)
library(datamations)

small_salary_eg <- small_salary %>%
  mutate(degreework = paste0(Degree, "_", Work))

datamation_sanddance(
  pipeline = "small_salary_eg %>% group_by(degreework)")

underscore_in_value

and . case works too:

library(dplyr)
library(datamations)

small_salary_eg <- small_salary %>%
  mutate(degreework = paste0(Degree, ".", Work))

datamation_sanddance(
  pipeline = "small_salary_eg %>% group_by(degreework)")

period_in_value

no variation in a group's values (e.g. all the same, or only one value per group)

(mainly works because I've removed the error bars for now)

library(dplyr)
library(datamations)

datamation_sanddance("mtcars %>% group_by(carb, gear) %>% summarise(wt = mean(wt))")

no_variation

sharlagelfand commented 3 years ago

More cases that work now:

data contained in first verb rather than piped in

This is now supported. The data is parsed out and moved to the first position. Some tests for this as well.

library(datamations)

"small_salary %>% group_by(Degree) %>% summarize(mean = mean(Salary))" %>%
  parse_pipeline()
#> [[1]]
#> small_salary
#> 
#> [[2]]
#> group_by(Degree)
#> 
#> [[3]]
#> summarize(mean = mean(Salary))

"group_by(small_salary, Degree) %>% summarize(mean = mean(Salary))" %>%
  parse_pipeline()
#> [[1]]
#> small_salary
#> 
#> [[2]]
#> group_by(Degree)
#> 
#> [[3]]
#> summarize(mean = mean(Salary))

"group_by(palmerpenguins::penguins, Degree, Work) %>% summarize(mean = mean(bill_length_mm))" %>%
  parse_pipeline()
#> [[1]]
#> palmerpenguins::penguins
#> 
#> [[2]]
#> group_by(Degree, Work)
#> 
#> [[3]]
#> summarize(mean = mean(bill_length_mm))

more than 2 grouping variables

This is well supported now with the column facet -> row facet -> colour grouping flow. The output of datamations_sanddance() is just a list of vegalite pseudo-specs right now, but some tests demonstrating that this works.

sharlagelfand commented 3 years ago

The only thing left here is "summarizing more than 1 value" which seems more like an enhancement than a bug! So I'm going to open a new issue for that and close this off.