rstudio / crosstalk

Inter-htmlwidget communication for R (with and without Shiny)
http://rstudio.github.io/crosstalk
Other
287 stars 52 forks source link

filter_select() and friends should use sensible group default #33

Open cpsievert opened 7 years ago

cpsievert commented 7 years ago

unique(sharedData$key()) seems like a reasonable thing to assume, if not provided

jcheng5 commented 7 years ago

unique? Hopefully you're not using keys that aren't unique? :)

Do you need this because it's tricky to specify the key as a group, and that's a thing you want to do? Or is it that you want a group default and are reaching for key because it's there? I'm less sympathetic to the latter than the former--I don't think it's unreasonable to ask you to specify how you want these things to be grouped.

cpsievert commented 7 years ago

Hopefully you're not using keys that aren't unique?

There are certainly useful examples with non-unique keys -- https://gist.github.com/cpsievert/b873740854cbfcc7a6c7ce891447c3fb

I want it by default because that's what I would want to do 80% of the time. And yes, it's not immediately obvious that group should be a function of sharedData$key()

jcheng5 commented 7 years ago

That non-unique key example makes no sense to me. We use the key to uniquely identify rows. How does linked brushing work if you can't uniquely identify rows?? I think we have a fundamentally different understanding of what a key is, and in my understanding, not only do they need to be unique but it's far less than 80% of the time that you'd want them to be the filter group. That's analogous to saying you usually want a UI widget to filter a database table by a primary key.

jcheng5 commented 7 years ago

I'm getting really buggy behavior from that example so maybe I have something installed wrong (I'll leave a comment on the gist).

But what I am able to see is that making a lasso selection on the plot on the right highlights some of the values on the left. If I don't have anything selected in the filter_select, then it's clearly not the right points (or right number of points) being selected. However, once I've changed SharedData$new(m, ~variable) to SharedData$new(m), then it selects the right points.

cpsievert commented 7 years ago

Sorry, I made that example to address this question, not to demonstrate linked brushing (please see my response in the gist)

We use the key to uniquely identify rows. How does linked brushing work if you can't uniquely identify rows?

I think when you say linked brushing, you imply 1-to-1 linking in a scatterplot matrix. I think of it much more generally. Here is another example (of 1-to-n linking):

library(plotly)
library(crosstalk)

tx <- SharedData$new(txhousing, ~city)
p1 <- ggplot(tx, aes(date, median, group = city)) + geom_line()
p2 <- plot_ly(tx, x = ~median, color = I("black")) %>% 
  add_histogram(histnorm = "probability density")

subplot(p1, p2) %>% 
  layout(barmode = "overlay") %>%
  highlight(
    "plotly_click", dynamic = TRUE, persistent = TRUE, 
    selected = attrs_selected(opacity = 0.3)
  )

This is the sort of example that I'd want filter_select() to inherit it's group definition from sharedData$key()

That's analogous to saying you usually want a UI widget to filter a database table by a primary key.

Exactly. This is how many of my mentors would think of a linked brushing framework. From (Cook et. al. 1991):

screen shot 2017-04-14 at 12 27 57 pm
jcheng5 commented 7 years ago

I think we are talking past each other because of a terminology problem. What does "key", and/or "primary key", mean to you? You seem to have defined it as "default grouping" and I'm defining it as "individual row locator". Is that accurate? If so, I'm not arguing right now about whether the concept of a default group is useful; but I am arguing that the property that I intended as an individual row locator, is most certainly not the right place to express that.

And to be clear, I'm also not arguing that 1-to-1 linking is the only useful form of linked brushing. However, what is true today is that Crosstalk is designed for 1-to-1 linking. That being said, I spent a lot of time during the development of Crosstalk thinking about how to support the more general n-to-n linking, and we should probably spend some time talking about it. To start with, I strongly believe you still fundamentally need unique keys; you just need to have potentially multiple keys assigned to each visual object, and depending on the visualization, you might also need to know how to compute partial selection (e.g. your visual object represents keys [A, B, C], and only [A, C] are selected).

cpsievert commented 7 years ago

Let's call "key" the "unit of interaction" -- defined when you initialize a SharedData object (i.e., the key argument). I had always taken the perspective that the definition does not have to be a unique key, and now I'm surprised you don't provide a check for that -- please don't decide to throw an error in that case!

I strongly believe you still fundamentally need unique keys

Why? How would that extend to a situation where you have x/y data for a polygon representing one "unit of interaction"?

you just need to have potentially multiple keys assigned to each visual object

I already have a notion of this built on top of non-unique keys. Please see this slide-deck http://cpsievert.github.io/talks/20161212b/#20