microsoft / datamations

https://microsoft.github.io/datamations/
Other
66 stars 14 forks source link

Decide on convention for mapping grouping variables to axes and facets #68

Closed jhofman closed 3 years ago

jhofman commented 3 years ago

Related to #61, we got in to some edge cases about how to make strange plots (that use, for instance, the same variable in a facet and on an axis) look good when datamated.

This brought up the question of what convention we should have for handling the mapping of grouping variables to axes and facets. We should probably decide on this before trying to handle these edge cases.

Here's a proposal to see the conversation, let's iterate on it together:

  1. With no ggplot command specified

a. If only one grouping variable, then this variable gets mapped to the x axis variable b. If two grouping variables, the first gets mapped to a column facet and the second gets mapped to a row facet. (And maybe we color the second by default? Or do we NOT color by default?) c. If three grouping variables, the first gets mapped to a column facet, the second gets mapped to a row facet, and the third gets mapped to the x axis. (Same question with color: maybe we color the third by default? Or not?)

  1. With a ggplot command specified, we instead try to inherit whatever aesthetics come from the ggplot command, falling back on the above conventions when certain things aren't specified? (This is half-baked, not sure if it makes sense.)

Okay, let's discuss :)

sharlagelfand commented 3 years ago

Thanks for pulling us back out of the weeds on this! Good to settle conventions before edge cases :)

Here's some sketches for the "no ggplot2 command" specified. I mainly agree with your assessment of grouping variable order, but I think that if there's two grouping variables only, it should be first to column, then second to X axis - maybe only introduce row faceting if there's three grouping variables - this will hopefully get us away from the "column and row facets with only a single point in each" situation.

This is really really close to what we have already, so I think we're almost there - it's just leaning on the x-axis before facets that's different, and labelling those axes in the infogrid stage

one grouping variable

e.g. group_by(species)

just one "group by" frame, with the grouping variable on the x-axis (and labelled):

group_by_species

two grouping variables

group_by(species, sex)

first grouping variable is in the column facets:

group_by_species_first

second grouping variable is in the x-axis (labelled) - and I think coloured is helpful too

group_by_species_sex

three grouping variables

group_by(island, species, sex)

first grouping variable in the column facets

three_groups_1

second in the row facets

Datamations-40

third on the x-axis (labelled), with colours

Datamations-41

Re: ggplot commands, I think it's easy enough to settle the basic cases - e.g. a variable in the x-axis, a variable in the column facets, one in the row facets, etc, and the order of animation is determined by column -> row -> x-axis, rather than what's specified in group_by(). I think we should be very clear about what level we support right now and build on that, rather than trying to handle super specific things (even like dodged points, last plot in this comment) from the get go.

sharlagelfand commented 3 years ago

Some adjustments to the R code and here's how this looks now. There are a few missing components / bugs but to get an idea. I've made some notes of potentially aesthetic improvements, especially around consistent widths and matching grids to their axes:

one group

Datamations-42

two groups

Datamations-43

three groups

Datamations-44

giorgi-ghviniashvili commented 3 years ago

@sharlagelfand that's great, I agree to this:

Could you please create an example specs for these cases so I will fix it? I will need to center align grids and have equal widths.

giorgi-ghviniashvili commented 3 years ago

I centered aligned grids:

image

image

image

Screen Shot 2021-06-03 at 15 39 38
jhofman commented 3 years ago

great, i agree too. (i actually meant to write column facets but was rushing and wrote row facets, so we're all aligned.)

@giorgi-ghviniashvili, thanks for center-aligning things. @sharlagelfand i agree with the annotations you made on updates to things (like consistent width for each group and fixing the overlaps).

giorgi-ghviniashvili commented 3 years ago

@jhofman when using facets, group widths are same, but when not they are not. When I have spec for it to test, I will be able to fix it as well.

sharlagelfand commented 3 years ago

@giorgi-ghviniashvili Here are specs for each of those cases: https://github.com/microsoft/datamations/tree/parse-ggplot2/sandbox/generalized_specs

giorgi-ghviniashvili commented 3 years ago

hi @sharlagelfand and @jhofman ,

I modified info grid code and tested on these three specs:

One group:

https://user-images.githubusercontent.com/6615532/120698663-660fab80-c4c0-11eb-9b63-e13aa53af385.mov

Two groups:

https://user-images.githubusercontent.com/6615532/120698728-7aec3f00-c4c0-11eb-8b49-1b3ffd9674f8.mov

Three groups:

https://user-images.githubusercontent.com/6615532/120698824-9fe0b200-c4c0-11eb-8d30-1e46bcd9ea11.mov

sharlagelfand commented 3 years ago

these look awesome @giorgi-ghviniashvili, thank you!!

In the first one the x-axis title is x in the infogrid - what's the easiest way for me to pass the "proper" title (species)? in the spec.x.title (or whatever the proper format is), or somewhere in meta?

The second two look great as well - is it possible to have the x-axis appear for those in the info grid as well? They should appear at the same time the colour legends do

giorgi-ghviniashvili commented 3 years ago

In the first one the x-axis title is x in the infogrid - what's the easiest way for me to pass the "proper" title (species)? in the spec.x.title (or whatever the proper format is), or somewhere in meta?

Good point, I can just use splitField as title.

The second two look great as well - is it possible to have the x-axis appear for those in the info grid as well? They should appear at the same time the colour legends do

Yes, possible, will add them too.

giorgi-ghviniashvili commented 3 years ago

@sharlagelfand

Added title to first case:

image

But unfortunately I got some difficulties of adding x-axis to second and third cases because of facet hacking.

To hack facet, I use real x axis positioned top for facet labels. Only way to add additional x-axis bottom is to have meta.axes: true, which does not work for several reasons:

image

But in case of grid, they do not follow this pattern, the inner groups are just side by side and having axis like that would not match their locations..

image

Let's discuss how important the x-axis is and how to achieve it.

giorgi-ghviniashvili commented 3 years ago

I think that we don't need a center alignment in case of two and three groups. We just need to match the location of x axis:

so x axis:

| Adelie | Gentoo | Chinstrap |

The inner grids should match this sequence, rather than side by side.. So this will result that Adelie + Torgersen will be far left because jitter frame looks like this:

image

In Dream, we should have a gap for Gentoo..

Let me know what you think.

sharlagelfand commented 3 years ago

Yeah, I think your assessment is spot on @giorgi-ghviniashvili - the grids should match the sequence of the axes, with gaps etc, rather than being centered

giorgi-ghviniashvili commented 3 years ago

So inner groups should have same widths? So this mean that each group will have facet width / 3.

In this case, Gentoo will have small space and points will overlap: image

giorgi-ghviniashvili commented 3 years ago

This is what I meant:

image

-- Video:

https://user-images.githubusercontent.com/6615532/120886692-34056300-c600-11eb-8e60-fc495b294405.mov

giorgi-ghviniashvili commented 3 years ago

We can increase width to fix it:

image

giorgi-ghviniashvili commented 3 years ago

Ok, @sharlagelfand and @jhofman ,

here is the updated version with grid alignments and x-axis when column facet + splitField !!!

(we need to include meta.axes = true where we need axis when column facet + splitField, @sharlagelfand please include it).

One group:

https://user-images.githubusercontent.com/6615532/120887395-9ca20f00-c603-11eb-8838-916d2e783ed2.mov

Two groups:

https://user-images.githubusercontent.com/6615532/120887422-d2df8e80-c603-11eb-8c08-5e5b830c80f4.mov

Three groups:

https://user-images.githubusercontent.com/6615532/120887628-125aaa80-c605-11eb-8e3f-a86cdd4763a7.mov

sharlagelfand commented 3 years ago

Thanks @giorgi-ghviniashvili, these look great! Latest version of the app is deployed with this (cc @jhofman if you want to play around with it)

jhofman commented 3 years ago

awesome, this is great progress!

is it me or does the upper right panel still look a bit misaligned on the x axis during jitter?

see 7 seconds in:

Screen Shot 2021-06-07 at 1 14 43 PM
giorgi-ghviniashvili commented 3 years ago

Good point! Fixed it by adjusting external forces and collision detection.

image

sharlagelfand commented 3 years ago

These conventions are set and described in the README:

datamations has some defaults in terms of how groups are represented. As seen in the above two examples, when there is one grouping variable, it’s shown on the x-axis. When there are two grouping variables, the first (by what comes first in group_by()) is shown in column facets, and the second is shown on the x-axis as well as colored. If there are three grouping variables, the first is in column facets, the second in row facets, and the third on the x-axis and colored.

Closing this!