Closed jonmmease closed 4 years ago
This all sounds fabulous! My own preference/suggestion is to support selecting isolated chunks of Curve/Area/Spread and to support selecting atomic Path/Path3D/Polygon items (not chunks of them), but I think you should be the one to make the call for how continuous elements are handled, once we've weighed in.
It would also be great to get support for Plotly Parallel Coordinate and Categories plots in HoloViews, with or without a corresponding (less capable) Bokeh or Matplotlib version.
Ok, I added some more detail about the elements that are not in the "initially supported" list.
Thanks!
Can we find a useful way to color subsets so that you can perceive continuous intensity and categorical color independently? If we turn each selection color into a light to dark colorscale then this would probably make sense.
Often this is just alpha; bokeh offers muted_alpha for this purpose. I'm not sure if that's compatible with the approach here; with alpha you need the original plot to be changed, not the selected one, because the original plot is normally at full opacity already. Obviously there are plenty of cases where alpha is already being used or would give results that are ambiguous given the page background, but it seems like in general alpha could show a selection on an image in enough cases that it could be a default behavior.
Thanks for all your work on this. I believe this vision has now been realized. Any additional fixes, ideas and suggestions should be in new issues.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Overview
This is a high-level issue describing the proposed design for an approach to automating the process of creating cross-dataset linked brushing dashboards using HoloViews. As demonstrated in the glaciers demo, it is already possible to create these dashboards, but it requires a non-trivial amount of manual logic to wire up the selection streams, gather and combine selections, and update the displayed elements. And even then, the resulting dashboard is not very modular, as it's not possible to add additional linked views without modifying the existing structure.
The goal here is to take advantage of the fact that HoloViews visualization elements have information about the underlying dimensions of the datasets used to construct them. Similar to the way that HoloViews can automatically link the axes of matching dimensions across views, we would like to make use of this same information to enable linked selections.
Basic user workflow
The general workflow for building a linked dashboard with this approach is for the user to first construct a
holoviews.Dataset
object that includes all of the relevant dimensions across all of the elements that will be created.The visualization elements (
Scatter
,Bar
, etc.) are created from the Dataset using theDataset.to
method. These elements are then combined into a layout. This layout is then transformed into an interactive selection-linked version using a newlink_selections
operation (Final name still TBD).Rather than construct a
holoviews.Dataset
object and using the.to
method, the higher-levelhvplot
library may also be used to create visualization elements directly from DataFrames.Visualizations across multiple notebook output cells can be linked together by constructing an instance of a new
SelectionManager
class, and providing that instance to each call tolink_selections
.Supported Elements
This approach would support all elements based on tabular data sets that don't require the use of a colormap to be useful.
Anticipated supported elements:
Bars
Scatter
Bivariate
BoxWhisker
Distribution
ErrorBars
Histogram
Labels
Points
Polygons
Spikes
Violin
VectorField
Scatter3D
(Plotly)These are continuous elements that would require a different
.select
behavior to be supported. See below for details.Curve
Path
Area
Spread
Path3D
(Plotly)Annotations. Some of these could be supported, but not sure if they should be
Arrow
Box
Ellipse
HLine
Spline
Unknown
Chord
,Graph
: Are there streams that could be used to get the selected node? If so, and the nodes are drawn from some stable set of categories then this might make sense. But even if not, it might make sense to display selection overlays even if you can't make selections on the element.Table
: Can rows be colored by data and are there selection stream available for highlighting rows?HexTiles
,Raster
,Image
: Can we find a useful way to color subsets so that you can perceive continuous intensity and categorical color independently? If we turn each selection color into a light to dark colorscale then this would probably make sense.QuadMesh
,TriMesh
: Same colorscale issue as above, plus deciding what you would be selecting (full quads/tris I would think).Elements that would not be supported
RGB
,HSV
: Color is already fundamental to the element so we can't really use it for selection as well.Contours
: General contours extend beyond the selection area, so I don't think selecting individual contours is generally meaningful.Sankey
,RadialHeatMap
: No real dimensions to build a selection onTiles
: Nothing to select, it's a backgroundIt's possible that these restrictions could be removed in the future. For now, elements that don't satisfy them will be skipped.
Core HoloViews implementation components
The goal is for the eventual implementation of
link_selections
andSelectionManager
to be as small and contained as possible by implementing a series of generally useful concepts in the core of HoloViews. Each of these components will be implemented in a separate PR, and they should be sensible and useful independent of their eventual use in thelink_selections
operation.Elements keep a reference to their source Dataset if available
In order to accomplish this goal of automating the process of creating linking selections, it is important that individual visualization elements maintain a reference to all of the dimensions in the original dataset, not only the dimensions that are needed to display the element. Perhaps the simplest way to accomplish this is for elements to maintain a reference back to the
holoviews.Dataset
object that they were created from. This won't always be available, in which case the dataset will beNone
. This can be implemented by adding a new read-only.dataset
property to all elements (allLabelledData
subclasses?), and updating various functions throughout the core to add or preserve this field.Dataset.to
should specify itself as the.dataset
property of the element it returns.Dataset.hist
should do the same..dataset
should return the dataset. This means thatdataset is Dataset(dataset.to(hv.Scatter))
should beTrue
..to
or the element constructor) should preserve the.dataset
property..dataset
property that has been indexed/selected like the element.General Principle: if an element has a
.dataset
property then it should be possible to reconstruct an identical element using the.dataset
data and thekdims
/vdims
metadata. (Histogram
is a slight known exception to this rule, because the bin edges from the original dimensions are needed as well). And it should be possible to create an element that references only a subset of the data using the select approach below.Select data by dim expression
Data selections will be represented as symbolic expressions built using the
holoviews.util.transform.dim
class. To make it more natural to use these expressions for this purpose, the existing.select
method on datasets and elements should accept predicatedim
expressions. Additionally, if an element has a.dataset
property, then these expressions should be able to reference all of the dimensions in.dataset
not only those listed invdims
/kdims
.Histogram
elements are a bit special because in their.data
property they store the bin edges and bin counts/frequencies. Currently, selection can only be performed on the single key-dimension. With these changes, this key-dimension selection will still behave just as before, but if a.dataset
is available then the dataset will also be filtered by the same criteria (this should not require reaggregation). If the selection involves dimensions other than the key dimensions, then this will also trigger reaggregation using the same bins.Selection nan mode
To handle selections on continuous elements (
Curve
,Area
, etc.) it is important to maintainNaN
value(s) where data were rejected by the selection criteria, otherwise it's not possible to break the element properly. Here's an example of the problem that arises with anArea
element with the default selection behavior.The proposal here is to add a new kwarg to
.select
to control how rejected data is handle. Naming is still up for discussion, but something likeselection_mode
, with three options'filter'
: Remove all rows that don't satisy the criteria (current and default behavior)'mask'
: Replace values in all rows that don't satisfy criteria withNaN
s. This would behave somewhat like the pandaswhere
method.'nan_join'
: Replace contiguous blocks of rows that don't satisfy criteria with a single row containingNaN
s. This is what the selection framework would use.An alternative to this approach would be to deem these continuous elements incompatible with linked selection and remove them from the supported elements list. These are the elements that would not be supported in this case:
Area
Curve
Path
Spread
Path3D
(Plotly)Selection expression from linked stream
Elements should be given a method that inputs an instance of a
LinkedStream
subclass, and returns a symbolicdim
expression that includes the data that would be selected by that stream, orNone
if the stream cannot be used to select data from the element. For example:(dim(A) >= 0) & (dim(A) <= 2) & (dim(B) >= 1) & (dim(B) <= 2)
There would also be a corresponding method on the stream class itself that delegates to the stream's
source
element.(dim(A) >= 0) & (dim(A) <= 2) & (dim(B) >= 1) & (dim(B) <= 2)
When
build_expr_for_stream
is called on aDynamicMap
withkdims
(sliders), the resulting expression will also restrict to current values of these key dimensions.Selection Manager implementation
The
SelectionManager
will be a parameterized class withexprs
andcolors
properties to hold the current selection expressions and current selection colors. AParam
stream wrapping an instance of this class will be used as input to theDynamicMap
instances that produce the selection overlays.link_selections
implementationThe
link_selections
operation will input a holoviews object andmap
it into a new object where supported elements have been overlayed withDynamicMap
s producing the selection overlays. TheseDynamicMap
s will input theSelectionManager
Param
stream and will use it to compute the subsets.Object with supported type
HoloViews objects with a
.type
property that is a supported element type will be overlayed with a.select
selection on the object. This will handle simple elements, HoloMaps, GridSpaces, and DynamicMaps that return a supported element type.If a DynamicMap is encountered that has not been initialized (
.type
isNone
), theinitialize_dynamic
function will be called on it to make this type information available.DynamicMaps with unsupported type
When processing a
DynamicMap
that returns an unsupported element,link_selections
will recursively walk through the inputs to the DynamicMap's callback, looking for an object with a supported element type. If one is discovered, then the selections will be performed at that point in the pipeline, and the selected element(s) will be passed through the rest of the pipeline before being overlayed.This approach will enable
link_selections
to handleDynamicMaps
created by therasterize
anddatashade
operations.Overlay vs colorscale
For many element types, overlaying a subset of the original element in a difference color is a good way to display selections. But some element types are better suited for displaying all of the colors for all selections in a single element. Scatter3D is one example. Other examples that aren't supported in HoloViews yet are the Plotly Parallel Coordinates and Parallel Categories plot types. For all of these cases, the best way to represent the various selections is to internally use a discrete colorscale.
To support this use-case, the selection mangager should provide a method to compute an array of the selection index for each data point in an element. So somewhere we'll need to store the information about which selection method is best for each element.
@jbednar @jlstevens @philippjfr