Closed sharlagelfand closed 2 years ago
One idea here is to simply treat count as "syntatic sugar" and always map it to a group_by + summarize.
For instance, df %>% count(vaccination_status, outcome)
becomes df %>% group_by(vaccination_status, outcome) %>% summarize(n = n())
.
This raises the separate issue of having a summarize operation that doesn't use a variable explicitly because the n()
function is called without an argument.
In this case I think that raises the question of whether we want a "custom aggregation animation" for count, maybe? Could be a simpler version of what's here on page 4.
@giorgi-ghviniashvili, let's play with the custom aggregation animation above for count to see how it goes?
one thought is that for some functions we have custom animations and for others we default to some black box type of thing as per #25
I did custom count animation, using gemini2-editor:
Next phase is to implement them as keyframes, within a single datamation step.
Managed to play animation using animateSequence function.
https://user-images.githubusercontent.com/6615532/144413622-d9f59b3a-2e5e-4891-b1c8-cff9d42c6058.mov
@sharlagelfand please provide an example that involves count operation. I would to integrate this into the App and need a spec for that.
Here is an example of what a count()
operation looks like:
library(dplyr)
"small_salary %>%
count(Degree)" %>%
datamation_sanddance()
If we want a custom animation on it, maybe we can add something to meta
that describes what the operation is so the JS side knows?
and the raw specs:
@giorgi-ghviniashvili can you remind me what to add to meta please? meta.custom_animation = "count"
?
@sharlagelfand yes, please
I was testing recommendKeyframes and recommendWithPath with hope to get the custom animation sub-specs for each aggregation type, but looks like it does not work properly, just returns original start and end specification and nothing more, no middle specs.
Maybe we should open a ticket on gemini github and ask if that support custom animation generation? If not support, we can implement that in JS.. What do you think @jhofman ?
I tried graphscape recommender (which is used internally by gemini), and get same response. Very interesting how it works..
Wrote a function that generates intermediate specs for count
and integrated in App
.
https://user-images.githubusercontent.com/6615532/145388460-39f7e49a-a1a3-4cd3-b11d-c43ae831be97.mov
Some concerns:
If we have too much circles, then stacking them into a single column is not visually correct (the circles will overlap), so we need to have actual grid.
But if we have a grid, then how should we place the rule lines? They are supposed to be placed at the top circle.
Maybe instead of line rule, we should think of a different visual element?
This looks good overall. We decided to add a key frame after stacking where everything "pulls up" into the top point, keeping the y axis in the same range, and after that things zoom so the y-axis is on the range of the resulting counts.
Let's show the y axis as soon as the stacking happens, and eliminate the shift on the x-axis.
Let's also zoom in on the y axis at the end so that the min and max on the axis are defined by the min and max counts.
Hi @jhofman, showing y axis, but there is the shift still, because x axis labels for stacked spec is a bit closer, than grid x labels.
https://user-images.githubusercontent.com/6615532/146345964-8030e0f1-3129-4894-a1c9-fb00dc213e76.mov
@sharlagelfand , to zoom in on the y axis, please set y axis domain properly, I set it to 28 manually (and also removed title: []
):
this is a great update. is there a way to deal w/ the ghost-shift that happens on the x axis in the final two seconds?
@jhofman alright, I simply set scale.domain
to [0, 3] instead of [0.5, 2.5]. That should be set from R or Python. [0, 3] because for count middle frames, we have this domain and they should be consistent.
https://user-images.githubusercontent.com/6615532/146963785-a876ffea-2d91-4403-abde-a85b3895ddd6.mov
@giorgi-ghviniashvili merged current main into the gemini2 branch (where these modifications exist)
@jhofman can check the gemini2 branch to see if there's a simple change to the generate_x_domain
that would align the tick marks for count
but not disturb alignment for other visualizations.
domains are updated and merged, closing!
I think this is inevitable as part of #97, and especially this infographic:
Right now we can show a similar infogrid by just grouping by e.g. vaccinated and outcome (
df %>% group_by(vaccination_status, outcome)
), but the fact that this would work and show the info grid is more of a side effect / hidden feature and is not how someone would actually be summarizing the data - more likely to have e.g.df %>% count(vaccination_status, outcome)
.We will likely want to split each counted variable into a step as we do with data, e.g.
In dplyr,
df %>% count(vaccination_status, outcome)
is equivalent todf %>% group_by(vaccination_status) %>% count(outcome)
, and I think it's important to make it clear in how we title things that the counts are combinations of vaccination status x outcome.Will have to think a bit more of the hierarchy of how these are displayed too and keeping it consistent with the grouping hierarchy, but also with the new rules we're creating as part of #98 - e.g. it would be nice to lean on colors / shapes here!