Open sharlagelfand opened 3 years ago
@sharlagelfand: There were too many observations in the bike data, so here's an artificial but hopefully still interesting one: take a few famous baseball players, compute their batting average for each year they played, noting the team they played for, and then look at their median batting average over the time with that team.
plyr::baseball %>%
filter(id == "ruthba01" | id == "cobbty01" | id == "hornsro01") %>%
group_by(id, team, year) %>%
summarize(ba = h / ab) %>%
group_by(id, team) %>%
summarize(median_ba = median(ba)) %>%
ggplot(aes(x = id, y = median_ba, color = team)) +
geom_point(position = position_dodge(width = 0.25)) +
labs(x = "Player", y = "Median batting average over time with each team")
I don't love the styling of this plot, but perhaps it's enough to get started with?
Thanks @jhofman! This actually brings up another question about how to handle summary operations that are combinations of multiple variables, e.g. ba = h / ab
- right now we don't have a way to show distributions of two variables or how the relationship between them derives a new variable... I'll create an issue for that, and see if we can come up with an example that just does multiple steps without making us encounter the "derived from multiple variables" for now
noting two things:
Snoozing this until we make progress on #62 for multiple variable manipulations.
Want to test out if it's possible to do group_by -> summarise -> group_by -> summarise (or e.g. group_by -> summarise -> summarise) - @jhofman will provide an example