Support `count()` operation

sharlagelfand commented 2 years ago

I think this is inevitable as part of #97, and especially this infographic:

Relevant icon array!

https://twitter.com/xruiztru/status/1452180847088517131

Right now we can show a similar infogrid by just grouping by e.g. vaccinated and outcome (df %>% group_by(vaccination_status, outcome)), but the fact that this would work and show the info grid is more of a side effect / hidden feature and is not how someone would actually be summarizing the data - more likely to have e.g. df %>% count(vaccination_status, outcome).

We will likely want to split each counted variable into a step as we do with data, e.g.

Initial data
Count vaccination status
Count outcome within vaccination status

In dplyr, df %>% count(vaccination_status, outcome) is equivalent to df %>% group_by(vaccination_status) %>% count(outcome), and I think it's important to make it clear in how we title things that the counts are combinations of vaccination status x outcome.

Will have to think a bit more of the hierarchy of how these are displayed too and keeping it consistent with the grouping hierarchy, but also with the new rules we're creating as part of #98 - e.g. it would be nice to lean on colors / shapes here!

jhofman commented 2 years ago

One idea here is to simply treat count as "syntatic sugar" and always map it to a group_by + summarize.

For instance, df %>% count(vaccination_status, outcome) becomes df %>% group_by(vaccination_status, outcome) %>% summarize(n = n()).

This raises the separate issue of having a summarize operation that doesn't use a variable explicitly because the n() function is called without an argument.

In this case I think that raises the question of whether we want a "custom aggregation animation" for count, maybe? Could be a simpler version of what's here on page 4.

jhofman commented 2 years ago

@giorgi-ghviniashvili, let's play with the custom aggregation animation above for count to see how it goes?

one thought is that for some functions we have custom animations and for others we default to some black box type of thing as per #25

giorgi-ghviniashvili commented 2 years ago

I did custom count animation, using gemini2-editor:

c0a7e79d-cb59-4805-bd4e-888bc1d85b42

Next phase is to implement them as keyframes, within a single datamation step.

giorgi-ghviniashvili commented 2 years ago

Managed to play animation using animateSequence function.

https://user-images.githubusercontent.com/6615532/144413622-d9f59b3a-2e5e-4891-b1c8-cff9d42c6058.mov

@sharlagelfand please provide an example that involves count operation. I would to integrate this into the App and need a spec for that.

sharlagelfand commented 2 years ago

Here is an example of what a count() operation looks like:

library(dplyr)

"small_salary %>%
  count(Degree)" %>%
  datamation_sanddance()

https://user-images.githubusercontent.com/15895337/144440663-50c86b17-4fd4-4e65-b46d-ed87f6036584.mov

If we want a custom animation on it, maybe we can add something to meta that describes what the operation is so the JS side knows?

and the raw specs:

```json [ { "height": 300, "width": 300, "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "meta": { "parse": "grid", "description": "Initial data" }, "data": { "values": [ { "n": 100, "gemini_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100] } ] }, "mark": { "type": "point", "filled": true, "strokeWidth": 1 }, "encoding": { "x": { "field": "datamations_x", "type": "quantitative", "axis": null }, "y": { "field": "datamations_y", "type": "quantitative", "axis": null } } }, { "height": 300, "width": 300, "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "meta": { "parse": "grid", "description": "Group by Degree", "splitField": "Degree", "axes": false }, "data": { "values": [ { "Degree": "Masters", "n": 72, "gemini_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72] }, { "Degree": "PhD", "n": 28, "gemini_ids": [73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100] } ] }, "mark": { "type": "point", "filled": true, "strokeWidth": 1 }, "encoding": { "x": { "field": "datamations_x", "type": "quantitative", "axis": null }, "y": { "field": "datamations_y", "type": "quantitative", "axis": null }, "color": { "field": null, "type": "nominal" }, "tooltip": [ { "field": "Degree", "type": "nominal" } ] } }, { "height": 300, "width": 300, "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "meta": { "axes": false, "description": "Plot count of each group" }, "data": { "values": [ { "gemini_id": 1, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 2, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 3, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 4, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 5, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 6, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 7, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 8, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 9, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 10, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 11, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 12, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 13, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 14, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 15, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 16, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 17, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 18, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 19, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 20, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 21, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 22, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 23, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 24, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 25, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 26, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 27, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 28, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 29, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 30, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 31, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 32, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 33, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 34, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 35, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 36, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 37, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 38, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 39, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 40, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 41, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 42, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 43, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 44, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 45, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 46, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 47, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 48, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 49, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 50, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 51, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 52, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 53, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 54, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 55, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 56, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 57, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 58, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 59, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 60, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 61, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 62, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 63, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 64, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 65, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 66, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 67, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 68, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 69, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 70, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 71, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 72, "Degree": "Masters", "datamations_x": 1, "datamations_y": 72, "datamations_y_tooltip": 72 }, { "gemini_id": 73, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 74, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 75, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 76, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 77, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 78, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 79, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 80, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 81, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 82, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 83, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 84, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 85, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 86, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 87, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 88, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 89, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 90, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 91, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 92, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 93, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 94, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 95, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 96, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 97, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 98, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 99, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 }, { "gemini_id": 100, "Degree": "PhD", "datamations_x": 2, "datamations_y": 28, "datamations_y_tooltip": 28 } ] }, "mark": { "type": "point", "filled": true, "strokeWidth": 1 }, "encoding": { "x": { "field": "datamations_x", "type": "quantitative", "axis": { "values": [1, 2], "labelExpr": "round(datum.label) == 1 ? 'Masters' : 'PhD'", "labelAngle": -90 }, "title": "Degree", "scale": { "domain": [0.5, 2.5] } }, "y": { "field": "datamations_y", "type": "quantitative", "title": [], "scale": { "domain": [28, 72] } }, "tooltip": [ { "field": "datamations_y_tooltip", "type": "quantitative", "title": [] }, { "field": "Degree", "type": "nominal" } ] } } ] ```

sharlagelfand commented 2 years ago

@giorgi-ghviniashvili can you remind me what to add to meta please? meta.custom_animation = "count"?

giorgi-ghviniashvili commented 2 years ago

@sharlagelfand yes, please

giorgi-ghviniashvili commented 2 years ago

I was testing recommendKeyframes and recommendWithPath with hope to get the custom animation sub-specs for each aggregation type, but looks like it does not work properly, just returns original start and end specification and nothing more, no middle specs.

Maybe we should open a ticket on gemini github and ask if that support custom animation generation? If not support, we can implement that in JS.. What do you think @jhofman ?

giorgi-ghviniashvili commented 2 years ago

I tried graphscape recommender (which is used internally by gemini), and get same response. Very interesting how it works..

giorgi-ghviniashvili commented 2 years ago

Wrote a function that generates intermediate specs for count and integrated in App.

https://user-images.githubusercontent.com/6615532/145388460-39f7e49a-a1a3-4cd3-b11d-c43ae831be97.mov

Some concerns:

If we have too much circles, then stacking them into a single column is not visually correct (the circles will overlap), so we need to have actual grid.
But if we have a grid, then how should we place the rule lines? They are supposed to be placed at the top circle.

Maybe instead of line rule, we should think of a different visual element?

jhofman commented 2 years ago

This looks good overall. We decided to add a key frame after stacking where everything "pulls up" into the top point, keeping the y axis in the same range, and after that things zoom so the y-axis is on the range of the resulting counts.

giorgi-ghviniashvili commented 2 years ago

"Pull up":

https://user-images.githubusercontent.com/6615532/145822704-d128967b-8ed2-43d4-9ed9-0c0f4d20cf64.mov

jhofman commented 2 years ago

Let's show the y axis as soon as the stacking happens, and eliminate the shift on the x-axis.

Let's also zoom in on the y axis at the end so that the min and max on the axis are defined by the min and max counts.

giorgi-ghviniashvili commented 2 years ago

Hi @jhofman, showing y axis, but there is the shift still, because x axis labels for stacked spec is a bit closer, than grid x labels.

https://user-images.githubusercontent.com/6615532/146345964-8030e0f1-3129-4894-a1c9-fb00dc213e76.mov

@sharlagelfand , to zoom in on the y axis, please set y axis domain properly, I set it to 28 manually (and also removed title: []):

jhofman commented 2 years ago

this is a great update. is there a way to deal w/ the ghost-shift that happens on the x axis in the final two seconds?

giorgi-ghviniashvili commented 2 years ago

@jhofman alright, I simply set scale.domain to [0, 3] instead of [0.5, 2.5]. That should be set from R or Python. [0, 3] because for count middle frames, we have this domain and they should be consistent.

https://user-images.githubusercontent.com/6615532/146963785-a876ffea-2d91-4403-abde-a85b3895ddd6.mov

jhofman commented 2 years ago

@giorgi-ghviniashvili merged current main into the gemini2 branch (where these modifications exist)

@jhofman can check the gemini2 branch to see if there's a simple change to the generate_x_domain that would align the tick marks for count but not disturb alignment for other visualizations.

jhofman commented 2 years ago

domains are updated and merged, closing!

microsoft / datamations

Support `count()` operation #109