microsoft / datamations

https://microsoft.github.io/datamations/
Other
67 stars 14 forks source link

What should multi-variable grouping look like in the general case? #25

Closed jhofman closed 3 years ago

jhofman commented 3 years ago

We currently have something that looks good for degree and work in the salary example.

jhofman commented 3 years ago

Maybe we can use ggplot2's faceting + other encodings to prototype this cheaply and figure out the limits of what we would(n't) want to show people.

For instance, would the following mapping work for the grouped grid frame: Grouping variable 1 = row, Grouping variable 2 = column, Grouping variable 3 = color+symbol ?

jhofman commented 3 years ago

Alternatively, as per @dggoldst's suggestion, when there are more grouping variables than we can handle, maybe animate only a subset of the resulting plot as an exemplar of the overall results (e.g., animate only one panel of a huge facetted plot)?

sharlagelfand commented 3 years ago

@jhofman @dggoldst I have a brain dump of notes here, exploring how many groups it's possible to handle in ggplot2 versus how many we might actually be able to handle differentiating and how to do that, the hierarchy of grouping, etc

jhofman commented 3 years ago

this is great, looking forward to discussing tomorrow.

dggoldst commented 3 years ago

Looks interesting! Can't wait to hear more.

On Thu, Apr 15, 2021 at 3:05 PM Jake Hofman @.***> wrote:

this is great, looking forward to discussing tomorrow.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jhofman/datamations/issues/25#issuecomment-820665351, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIMIDIVH7GJJ7GV2PB3IPTTI42JHANCNFSM42ZQWLOA .

jhofman commented 3 years ago

we decided we'd try to adjust the group-by representation to be in "clumps" that mirror facets (first variable is row, second is column, third is possibly "nested" as in this figure).

IMG_2871

sharlagelfand commented 3 years ago

It would seem that the "faceted and grouped dot plot" infrastructure is pretty underdeveloped in R! Took a bit, but here are a couple of proof of concepts of the grouping, with much better spacing and clarity I think! We can discuss tomorrow :)

  1. Grouping in facets and subplots (like the example Jake posted above)
  2. Grouping in facets and colours within

cc @jhofman @dggoldst

jhofman commented 3 years ago

Next step is to see how creation and exporting of these new group by plots translate to vegalite specs.

As mentioned in #28, vegalite supports faceting, so hopefully vegawidget does as well.

Let's try two different ways to export?

  1. Just facets: the equivalent of facet_grid(island ~ species) for the penguins data
  2. Patchwork subplots: island1 + island2 + island3

Does the latter have to be an array of vegalite json specs?

sharlagelfand commented 3 years ago

Here are the specs for 1 (just facets): https://github.com/jhofman/datamations/tree/groups/sandbox/grouping/facet_color_specs

Working on the subplots specs - looks like repeating views is a good start for this, but I'm not sure whether we can combine repeated views with facets - will continue to dig into it.

sharlagelfand commented 3 years ago

Good idea on using the underlying facet data to "fake" the facets, @jhofman! Much easier than actually trying to offset ourselves, I think. There's a rendered example of what grouping looks like with real facets, "fake" facets, and the fake facets translated into vegalite here

I haven't quite figured out the custom axes in vegalite (e.g. having the islands on the x-axis and the species on y), but I'd say the grouping is pretty convincing otherwise! I've made the size of each "fake facet" equal (like it is in ggplot2), but we can definitely remove that if it seems weird

vegalite specs are here cc @giorgi-ghviniashvili

jhofman commented 3 years ago

wow, that looks great @sharlagelfand, and glad it's easier than the do-it-yourself solution.

how hard do you think it would be to add facet labels in the faked version on rows and columns, to make it easier to see what the groups are?

sharlagelfand commented 3 years ago

thanks @jhofman!

Do you mean direct labelling on each facet, like this?

Datamations-19

Or just adding to the vegalite version like this (i.e copying over what's in the ggplot2 faked version)?

Datamations-20

jhofman commented 3 years ago

the second: copying over what's on the ggplot2 version to vegalite.

sharlagelfand commented 3 years ago

oooh yes, that's what I meant by "i haven't quite figured out the custom axes in vegalite" - let me dig into it! just wanted to make sure this was a good direction to head first.

for context, what's happening in the ggplot2 case is that e.g. for the x-axis, the values are still e.g. 1, 2, ..., 59 (however many points there are), but the labels are "Biscoe", "Dream", and "Torgersen", strategically placed at the correct breaks (the midpoint of each fake facet). So I'll look into doing the same with vegalite

jhofman commented 3 years ago

great, thanks and sorry to have missed the comment about "custom axes in vegalite".

On Wed, Apr 21, 2021 at 4:43 PM Sharla Gelfand @.***> wrote:

oooh yes, that's what I meant by "i haven't quite figured out the custom axes in vegalite" - let me dig into it! just wanted to make sure this was a good direction to head first.

for context, what's happening in the ggplot2 case is that e.g. for the x-axis, the values are still e.g. 1, 2, ..., 59 (however many points there are), but the labels are "Biscoe", "Dream", and "Torgersen", strategically placed at the correct breaks (the midpoint of each fake facet). So I'll look into doing the same with vegalite

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jhofman/datamations/issues/25#issuecomment-824344862, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAATNS5BJEYW5WAJ34G5XNDTJ42GTANCNFSM42ZQWLOA .

sharlagelfand commented 3 years ago

Figured out the axes in vegalite, rendered here now, and an example:

visualization

One thing is that the axes are now occupied... so when we want to render axes of the actual values (i.e. once we move onto the scatterplot / summarised view) we might have to use annotations for those? Or maybe move these facet labels to annotations, if they can exist outside of the actual plotting area.

jhofman commented 3 years ago

great! related to #32, so ccing @giorgi-ghviniashvili

giorgi-ghviniashvili commented 3 years ago

@sharlagelfand could you please point me to the vegalite docs of annotations?

sharlagelfand commented 3 years ago

@giorgi-ghviniashvili I haven't seen much in terms of actual documentation, but maybe these examples of layered plots with labels/annotations will be a good place to start?

If you're curious about the custom axes labels, I did something like this - the values of where the labels are is the midpoint of each "facet"

sharlagelfand commented 3 years ago

Going to close this! We have a general case figured out and #40 covers the idea of IDing customized multi-grouping