microsoft / datamations

https://microsoft.github.io/datamations/
Other
67 stars 14 forks source link

Update README, ensure table code is working #37

Closed sharlagelfand closed 3 years ago

sharlagelfand commented 3 years ago

Once #23 is all closed out and we have a functional widget, will need to update the README illustrating how it works! I don't think you can embed an htmlwidget into a static README so this may have to be in the form as htmlwidget -> movie -> GIF, but will double check on that.

Since we haven't done anything with the table code yet, will take a pass through and ensure it all still works and can be left in.

sharlagelfand commented 3 years ago

Need to update the README with the widget fot plotting, but confirming that the table code works and can be left in 👍

sharlagelfand commented 3 years ago

👍 plot examples in README work, all good:

"small_salary %>% group_by(Degree) %>% summarize(mean = mean(Salary))" %>%
  datamation_sanddance()

https://user-images.githubusercontent.com/15895337/117343379-3abe8000-ae72-11eb-9ed9-99693850d85d.mov

"small_salary %>% group_by(Degree, Work) %>% summarize(mean = mean(Salary))" %>%
  datamation_sanddance()

https://user-images.githubusercontent.com/15895337/117343541-65a8d400-ae72-11eb-8be0-54d113b6e8a2.mov

Some notes on discrepancies that remain between the originals (in the main branch README) and these versions, + beyond in the case of 3 grouping variables (e.g. penguins) and some general thoughts on what else we might want to fix. making a checklist so i can mark off as we add these in:

might have missed some things but will add to this as I find em! cc @jhofman @giorgi-ghviniashvili

jhofman commented 3 years ago

awesome, this looks great and thanks for summarizing the needed updates in an easy-to-follow checklist!

giorgi-ghviniashvili commented 3 years ago

Mostly just curious on this end... what's going on with the size of the facet labels changing? Goes smaller -> bigger -> smaller a few times

@sharlagelfand please update gemini.web.js with this to include latest tickPadding fix.

giorgi-ghviniashvili commented 3 years ago

Filled points instead of transparent? Seems pretty quick to change in vega lite and much more like the scatter plots I'm used to :) I can update in the specs I'm passing but likely needs to be updated on the JS side as well

yes, try to use filled: true.

giorgi-ghviniashvili commented 3 years ago

Maybe we should add facet titles? so "Degree" as the title on the column facets and "Work" as the title on the row ones

You can add titles now, currently facet.row.title and facet.column.title are null.

giorgi-ghviniashvili commented 3 years ago

x-axis values are numeric, not actual group labels in the case of 3+ variables (discussed in #33) - wondering if it'll work to have numeric values as the breaks, but the values (e.g. undefined, male, female) as the labels. similar to what i discussed/figured out here + beyond in that thread, but with sex instead of the islands on the x-axes.

@sharlagelfand Since this is encoding.x.axis , can you try to set it yourself and just sent it via spec? Use tickExpr.

giorgi-ghviniashvili commented 3 years ago

values not centred on the x-axis, e.g. some appear on the edge of their facets instead of centred (discussed in #32) - maybe just domain needs to be increased to show

Please try to add 0.5 for padding.

sharlagelfand commented 3 years ago

Sorry, just finally gone through all these now!

Mostly just curious on this end... what's going on with the size of the facet labels changing? Goes smaller -> bigger -> smaller a few times

@sharlagelfand please update gemini.web.js with this to include latest tickPadding fix.

Will try that - you don't have the latest version here right? The sizes are still changing there I've updated my version of gemini but still running into this issue.

Filled points instead of transparent? Seems pretty quick to change in vega lite and much more like the scatter plots I'm used to :) I can update in the specs I'm passing but likely needs to be updated on the JS side as well

yes, try to use filled: true.

The specs here now all use filled: true but it doesn't perpetuate through the datamation here, maybe needs to be updated in the JS too?

Maybe we should add facet titles? so "Degree" as the title on the column facets and "Work" as the title on the row ones

You can add titles now, currently facet.row.title and facet.column.title are null.

Have updated the titles in the specs here but doesn't look like they're coming up

x-axis values are numeric, not actual group labels in the case of 3+ variables (discussed in #33) - wondering if it'll work to have numeric values as the breaks, but the values (e.g. undefined, male, female) as the labels. similar to what i discussed/figured out here + beyond in that thread, but with sex instead of the islands on the x-axes.

@sharlagelfand Since this is encoding.x.axis , can you try to set it yourself and just sent it via spec? Use tickExpr.

Will try this on my end - I figured it out before so shouldn't be too bad to do again. Got this working!

values not centred on the x-axis, e.g. some appear on the edge of their facets instead of centred (discussed in #32) - maybe just domain needs to be increased to show

Please try to add 0.5 for padding.

Where should I add this? To the actual X values? Or padding the domain on the plot?

jhofman commented 3 years ago

@sharlagelfand, @giorgi-ghviniashvili: just checking in on this. is it realistic to try to update the README w/ the new examples today (before tonight's talk)?

it's okay if not, but just want to get a sense.

sharlagelfand commented 3 years ago

@jhofman definitely realistic and my plan to! If @giorgi-ghviniashvili has time to answer some of the questions in my comment above we can make some progress on the more visual aspects of the datamations, but if not will definitely update with how it's looking now. The app is updated with the latest versions of everything.

Would you be able to merge the two existing PRs, then I can rebase on them, update the README and PR my refactor-test branch? Then if we are able to make additional changes before tonight I can do that on a new, smaller branch so we don't have to worry about this mammoth branch too late in the day 😁

jhofman commented 3 years ago

great!

btw, just tried out the app and saw some funniness in placement of the points on the last frame:

Screen Shot 2021-05-10 at 12 30 40 PM

sharlagelfand commented 3 years ago

Not sure what's going on there, the last spec on its own looks fine so it might be something on the JS / axes faking side.

sharlagelfand commented 3 years ago

+ bonus ahh, that's not good because the values are actually < 100 so that final plot is incorrect (not just e.g. axes don't go far enough), here's how it should look:

Screen Shot 2021-05-10 at 12 35 59 PM
giorgi-ghviniashvili commented 3 years ago

Where should I add this? To the actual X values? Or padding the domain on the plot?

axis.scale.domain

....

giorgi-ghviniashvili commented 3 years ago

@jhofman Ok, I fixed it. The problem was that distances between facet title and axis and regular axis title and axis are different, which made the top positions different. Actually generally these kind of things are cons of "hacking facets", we need to have some hardcoded padding / spacing adjustments..

sharlagelfand commented 3 years ago

Thanks @giorgi-ghviniashvili! Filled points and facet title are working now.

sharlagelfand commented 3 years ago

Unfortunately doesn't look like I can do the axis.scale.domain stuff on my end (due to the fake facets, the specs fly off the screen then fly back on... not what we want!), was going to try to hack it today before the talk but will have to wait to be done properly in JS!

Working on updating the README now with updated GIFs, then I will PR @jhofman

jhofman commented 3 years ago

sounds good, thanks for the update.

sharlagelfand commented 3 years ago

Looks like something is going on here with the scaling of the y-axis - it's scaled just to the domain for the jitter view (which has no axes, as shown in this comment) but then scaled 0 -> full domain for the summary view (which has axes). The consequence is that briefly it shows the axis for the jitter view but with wrong values, so it looks like somehow values between 0 and 40 (Masters in Academia) have a mean of ~85, which of course isn't what's actually happening in the data

So the animation has these frames:

jitter view, no axes

Screen Shot 2021-05-10 at 3 48 17 PM

jitter view, incorrect axes

Screen Shot 2021-05-10 at 3 49 00 PM

summary view, correct axes (but domain is too large)

Screen Shot 2021-05-10 at 3 49 07 PM
giorgi-ghviniashvili commented 3 years ago

@sharlagelfand good catch! That's because we need to match domain of real faceted view for axis to the hacked facet domain. I think I fixed it.

sharlagelfand commented 3 years ago

I've updated the README with how things are looking now, the main thing that could be updated before we merge is that the axes should show up one frame earlier (as soon as animation of infogrid -> jitter starts).

A couple other questions/things to keep an eye on:

https://user-images.githubusercontent.com/15895337/117847350-4209d300-b250-11eb-81c6-b1de80855b12.mov

(Happy to move these ^^ notes to a new issue, just somewhere to collect my thoughts)

cc @jhofman @giorgi-ghviniashvili

sharlagelfand commented 3 years ago

Just to update where things are at, the app here has the latest of everything!

You can change the size now, there are some defaults set but it would be nice to have it auto-size a bit based on the number of row/column facets but at least some control is nice! Still having some issues with values moving across facets but we can dig in more tomorrow.

sharlagelfand commented 3 years ago

(Will also close this issue and move outstanding stuff to new issues tomorrow, since the README is updated!)

jhofman commented 3 years ago

thanks @sharlagelfand

@giorgi-ghviniashvili is it possible that there's still a bug in the axis positioning or labeling? seems like the averages in the last frame are in the 90k region, but i remember them being in the high 80s.

giorgi-ghviniashvili commented 3 years ago

@jhofman no, the y values are 90s: image

sharlagelfand commented 3 years ago

Looks like these are right:

Screen Shot 2021-05-14 at 11 46 50 AM
  Degree   mean
1 Masters  90.2
2 PhD      88.2

so going to close this now!

jhofman commented 3 years ago

Looks like the shiny app still shows two values about 90. is that previous screenshot from the app or somewhere else?

Screen Shot 2021-05-17 at 10 31 08 AM

sharlagelfand commented 3 years ago

Huh, you're right! And the data shown in the app is wrong too (but seems to match the values on the plot)

Screen Shot 2021-05-17 at 10 36 32 AM

Not sure what's going on here - I'll dig into it.

sharlagelfand commented 3 years ago

Oooh - there's two different "small salary" data sets - small_salary and small_salary_data:

library(dplyr)
library(datamations)

small_salary
#> # A tibble: 100 x 6
#>       ID Degree  Work     Salary i     order
#>    <int> <fct>   <fct>     <dbl> <chr> <int>
#>  1    22 Masters Academia   81.9 id        1
#>  2    96 PhD     Academia   84.5 id        2
#>  3    10 Masters Academia   82.9 id        3
#>  4    42 PhD     Academia   83.8 id        4
#>  5    55 PhD     Academia   83.8 id        5
#>  6    14 PhD     Academia   85.3 id        6
#>  7    33 PhD     Industry   91.4 id        7
#>  8   100 PhD     Academia   85.3 id        8
#>  9    57 Masters Academia   83.3 id        9
#> 10     2 PhD     Industry   92.3 id       10
#> # … with 90 more rows

small_salary %>% 
  group_by(Degree) %>%
  summarise(mean = mean(Salary))
#> # A tibble: 2 x 2
#>   Degree   mean
#>   <fct>   <dbl>
#> 1 Masters  90.2
#> 2 PhD      88.2

small_salary_data
#> # A tibble: 30 x 3
#>    Degree  Work     Salary
#>    <chr>   <chr>     <dbl>
#>  1 Masters Industry     86
#>  2 Masters Academia     71
#>  3 PhD     Industry    104
#>  4 Masters Industry     94
#>  5 Masters Academia     93
#>  6 Masters Academia     96
#>  7 PhD     Academia    100
#>  8 Masters Industry     86
#>  9 PhD     Academia     80
#> 10 Masters Industry     85
#> # … with 20 more rows

small_salary_data %>%
  group_by(Degree) %>% 
  summarise(mean = mean(Salary))
#> # A tibble: 2 x 2
#>   Degree   mean
#>   <chr>   <dbl>
#> 1 Masters  90.6
#> 2 PhD      92.1

Will open a new issue for this.