observablehq / plot

A concise API for exploratory data visualization implementing a layered grammar of graphics
https://observablehq.com/plot/
ISC License
4.28k stars 174 forks source link

Brushing #5

Open mbostock opened 3 years ago

mbostock commented 3 years ago

Brush a rectangular region to select data.

Fil commented 3 years ago

Tentative implementation : https://observablehq.com/d/2044e02a89cc688a [EDIT: new URL]

Since Brush extends Dot the brushed values appear as new dots. Seemed like a base, but might not cover all use cases.

mbostock commented 2 years ago

Reverted in #748.

tophtucker commented 1 year ago

Idly pondering a few principles of brush ergonomics…

1 — Unlike in D3, Plot brushes could be scale-aware, e.g. snap to ordinal points / bands. This feels analogous to the nice and well-done principle that tips should be anchored to the data point, not the continuously-moving cursor.

But I don’t know how to handle the case of a non-ordinal (unordered) categorical scale, since brushing imposes order by privileging contiguous selection. One could argue it oughta work more like discrete multi-select in that case! But in any case Plot doesn’t distinguish ordinal from categorical.

2 — I’ve always been frustrated by how most brushes handle selecting extrema (points at the edges of the domains). When I drag a brush off an edge, I usually have a qualitatively different intention: I want to represent a one-sided inequality, like >10, not a two-sided interval, like [10, 20]. But the brush I end up with, even if it allows the dragging mouse to overshoot, viscerally looks like it is just barely capturing all points; the line representing the edge will bisect a dot on the edge. I squint: am I confident it’s not one pixel short?

And, if e.g. a histogram is being used to create a persistent filter for a table, the brush could have misleading semantics: I wanted to filter to anything >10, but instead I save a filter to anything greater than 10 and less than the largest value to date; if larger values come in later, they won’t be captured.

So I’ve always wanted to try making a brush that can distinguish [10, max] from [10, ∞). You’d have a little snapping at the max, and then a little overshoot area, which indiscriminately dragging rightward would land you in (shoutout to Paul Fitts), which represents an infinite upper bound. The brush edge would disappear in some satisfying way to reassure you that you’re not one pixel short of the max, but rather, solidly representing an inequality.

3 — This is not as important, but, as long as we’re going through my longtime wishlist… if a heavy-duty data analysis brush really needs parity with symbolic filter expressions, maybe you could click the edge to toggle between open interval (>, (, ⃝) and closed interval (≥, [, ●).

4 — Akin to scale-awareness, it’d be nice to have a “truly” one-dimensional brush that doesn’t just feel like a degenerate 2D brush. It feels weird in D3 brushes when you can only visually distinguish a 1D brush by how the cursor looks over an edge. Akin to pt. 2, a horizontal brush’s vertical extent looks like an interval [min, max] but is really just showing its independence from the vertical axis (-∞, ∞). Maybe it could be more like a two-sided range input sitting on the axis, maybe it just shouldn’t have top and bottom edges, idk.

Hvass-Labs commented 10 months ago

Hello. I'm sorry to interrupt. I am trying out Observable Plot in a HTML web-site and I really need this feature to select data inside a plot. I tried the demo Notebook linked in the PR and it looks great. It seems it is finished and just needs to be merged. Is that correct?

I considered building the plot.js file from the dev-branch so I could try it out. But I have no experience building Javascript projects, and it seems that I would need a lot of tools for that. So hopefully you can merge and release this feature soon if it is finished. I'm sorry to rush you :-)

I also had no idea this feature was called a "brush". In your docs and tutorials it might be a good idea to mention words like "range-slider / selector" which are used in other plotting libraries such as Plotly. That way people can probably find it more easily using internet search.

Thanks!

Fil commented 10 months ago

Hello, fortunately you don't need anything else than a browser to build the project. See https://observablehq.com/@recifs/brush-1653.

Hvass-Labs commented 10 months ago

@Fil I want to make plots on my own web-site so I need the plot.js file. I cannot use your Notebook system.

Fil commented 10 months ago

I've attached the file to the notebook. Just click on the files pane (top right), and download plot.umd.min.js.

Hvass-Labs commented 10 months ago

@Fil Thanks! I've now tried it and I assume you would like some feedback?

Let me first say that Observable Plot generally has a very elegant syntax!

I've attached your compiled java-script file, as well as a simple HTML file for testing it (Test-Brush.zip). It works in Chromium on Linux. The data has about 22,500 data-points which cover about 60 years of daily data-points. This snippet omits most of the data-points:

<html>
  <body>
    <main>
      <div id="graph"></div>
    </main>

    <script src="https://cdn.jsdelivr.net/npm/d3@7"></script>
    <script src="plot.umd.min.js"></script>

    <script>
    var data = [
      { Date: new Date('1962-01-02'), Value: 0.0406 },
      { Date: new Date('1962-01-03'), Value: 0.0403 },
      { Date: new Date('1962-01-04'), Value: 0.0399 },
      { Date: new Date('1962-01-05'), Value: 0.0402 },
      // ... about 22,500 records in total.
      { Date: new Date('2023-11-13'), Value: 0.0463 },
      { Date: new Date('2023-11-14'), Value: 0.0444 },
      { Date: new Date('2023-11-15'), Value: 0.0453 },
      { Date: new Date('2023-11-16'), Value: 0.0445 },
    ];

    var plot = Plot.frame().plot({
      inset: 5,
      grid: true,
      y: {percent: true},
      marks: [
        Plot.lineY(data, {x: "Date", y: "Value"}),
        Plot.lineY(data, Plot.brushX({x: "Date", y: "Value"})),
      ],
    })

    var graphDiv = document.getElementById('graph');
    graphDiv.append(plot);
    </script>
  </body>
</html>

It loads and draws very quickly, but brushing / marking the selection is a bit sluggish. I don't know how any of this works. But I wonder if you are doing a linear search for the pointer-position in the data-array? A binary search would probably be much faster.

It also draws a line from the start to the end-point of the selection. I don't know if I did something wrong, or if that is a bug or feature? But I don't want that line.

Image-Bug

Another problem is that if I select the entire data-range, then I cannot remove the selection. I first have to "nudge" an end-point and it's a bit awkward. Perhaps a double-click on the selection should remove it?

I also can't figure out how to extract the selected data. I'm not a java-script programmer, so I just hack things together until it works :-) It seems that you have created a new java-script construct called viewof in your Observable Notebooks, that is used to extract the selected data. How would I do this using normal java-script?

I think the way Plotly does this, is to register a function on the plot that gets called whenever the selection changes, like this:

graphDiv.on('plotly_relayout',
function(eventdata){
  var start = eventdata['xaxis.range[0]'];
  var end = eventdata['xaxis.range[1]'];
});

If you make any changes to fix these issues, then please compile the java-script again so I can test it.

Thanks!

Fil commented 10 months ago

Thanks for the tests and the feedback.

The line connects non-selected points from the left to non-selected point on the right hand side of the brush. I'm not sure if that's something the transform can detect on its own (maybe?) ; in the meantime you can fix this by saying:

Plot.lineY(data, {x: "Date", y: "Value"}),
Plot.lineY(data, Plot.brushX({x: "Date", y: "Value", unselected: {stroke:null}, selected: {stroke:"red"}})),

it should also make the interaction a bit faster. Re: performance, there are quite a few things we can do to make it much faster, but we first want to get the API right.

the viewof keyword can be replaced in vanilla javascript by adding an event listener to the returned plot, on the "input" event. This is documented in https://observablehq.com/plot/interactions/pointer with the following code snippet:

const plot = Plot.plot(options);

plot.addEventListener("input", (event) => {
  console.log(plot.value);
});

please compile the java-script again

I would rather encourage you to learn how to do that yourself. It's documented in the last paragraph of https://github.com/observablehq/plot/blob/main/CONTRIBUTING.md#documentation

Hvass-Labs commented 10 months ago

@Fil Thanks very much for the extremely quick reply and all your help!

The main reason I plot using java-script is so the user can select a sub-set of the data, otherwise I would do the plotting on the server, where I already make many static plots.

Using your advice I got it working as I want. It looks beautiful and I like it a lot. It is remarkable that I can get such advanced functionality with just a few lines of code. So thanks very much for that!

As I mentioned, my only "gripe" is that it is a bit sluggish when changing the selection, and I noticed the CPU usage also goes to 100%.

Another feature you may consider if you don't have it already, is to set the selection using java-script code instead of the GUI sliders. Or maybe specify the initial selection like this:

Plot.lineY(data, Plot.brushX({x: "Date", y: "Value",
                              x_start: new Date('1995-01-02'),
                              x_end: new Date('2012-03-04')})),

I hope you will finalize all this soon. Until then I will use this "beta-version".

Thanks!

Fil commented 10 months ago

In trying to understand why the brush is slow in the example you shared.

I think it might not be the computation itself, but rather because the browser has a hard time rendering an image from the SVG. We're only moving a rect around, but the browser has to compose it with a very complex line path (35kB of path data).

A solution for this is to simplify the line; below is a simple strategy (retaining 1 sample out of 10). I've also simplified to brush to its simplest possibility.

  marks: [
    Plot.lineY([1], Plot.brushX({x: [], y: []})),
    Plot.lineY(data, {x: "Date", y: "Value", filter: (d,i) => i%10===0}),
  ],

Data decimation transforms are discussed in #1707.

Hvass-Labs commented 9 months ago

Thanks!

I tried your suggestion and it is a lot faster - but there's still a little "friction" and the CPU meter also shoots up when I move the sliders. So it is still doing some heavy computations.

The data-filtering (i.e. only selecting every 10th data-point) can be problematic in some use-cases that need the full data.

EDIT: I first said your hack didn't work when plotting two lines using the stride arg, but because it uses an index-filter with modulo 10, it had only selected data-points from one of my lines. My mistake.

But the following code I still don't understand:

Plot.lineY([1], Plot.brushX({x: [], y: []})),

EDIT 2: This code seems to make the selected data array in plot.value empty.

The other day I measured the time-usage for creating the SVG plot with all the data and adding it to the DOM, and that was only 200 milli-sec. (That is extremely quick compared to Matplotlib for Python which often takes several seconds to generate an SVG plot.) But the problem is of course if it has to draw it many times per second when moving the slider in real-time.

I think you update the data-selection and the plot in real-time, so whenever the slider is being moved, you select the new sub-set of the data and plot it. And you have to go through the entire array with data every time the slider moves. Is that right?

How about providing a boolean arg like Plot.brushX({realtime: true/false, ...}) whether the data should be updated real-time, or only when the user releases the mouse-button? So while I hold down the left button and move the slider, you can just copy the SVG that was previously generated, and then add a single line on top of that. When I release the mouse-button, it computes the sub-set of the data and redraws the plot properly.

Perhaps the realtime arg could also be set to a refresh-rate instead of a boolean, so I could say that I only want the full plot drawn e.g. once per second.

Or maybe you can make a fast version that isn't a full brush with all the plotting-options. In my use-cases I don't need to change the plot-line colors or anything else, I just need to mark the beginning and end of the selection.

By the way, I think I would prefer to invert the color-scheme, so the selected part is bright and the unselected part is dark. I think that would be more natural because the "full" selection is also bright at first.

I previously suggested using a binary search as well to quickly find the sub-set of the data that has been selected. But that would require for the data to be sorted, which I'm not sure is the case in your internal data-structures?

Fil commented 9 months ago

The primary problem here is that the browser struggles to re-render the complex svg, even when the part that changes is a simple overlay. It's not due to the computations of the brush interaction.

Simplifying the path of the "non-interactive" line makes the problem go away. With proper decimation, a chart with a million data points (or even 10 million) can have a brush running at interactive speed. I'll add more comments on that subject in #1707.

The Plot.lineY([1], Plot.brushX({x: [], y: []})) construct in the example above is a mark with a brush, whose data is limited to a single point that has no coordinates. This brush only informs us (by emitting input events) about its extent (x1 to x2). It doesn't draw anything, and doesn't filter anything. This makes it fast.

The task of filtering the dataset can then be done as you suggest, by bisection on the (ordered) dataset. Ordering is an O(N log N) operation, but you need to do it once, if at all. Bisection is a O(log N) operation, very fast; it can even be O(1) if the x value is very regular (for example, if you have exactly one datum every second, you can work by quantization). Some statistical methods might not need the filtered dataset, for example if you just need to display the number of values selected; you might also want to work exclusively with indices and columnar channels to make this faster.

Another win, when using bisection (or quantization) on typed arrays, is that you can use a subarray — a simple operation (zero memory overhead, O(1)).

Hvass-Labs commented 9 months ago

It makes sense that it is mainly the rendering that is slow when there are many data-points. I tried your new demo Notebook with down-sampling and it is very fast on both my PC and average consumer phone. And it looks like it still selects a sub-set from the full data-array which I need.

If you can implement this with a simple syntax, then it's a great solution, and I will eagerly await your updates!

Please demonstrate all these things in your docs and tutorials on brushing when you get that far.

EDIT: If possible I really would like the ability to invert the color-scheme, so the selected area is bright and the unselected area is dark. I think that looks more natural. Here's an example from Plotly.

Hvass-Labs commented 9 months ago

Another feature request: In the example above with a time-series, I would like to show a narrow and vertical histogram to the right side of the main plot, so when I change the selection of the data in the main plot, the histogram automatically gets updated with the data from the selection. Would this be possible? Thanks.

Hvass-Labs commented 9 months ago

Feature request: It would be great if I could hold the Shift-key and add to my selection with different regions of the plot. And maybe if I could hold the Ctrl-key it would subtract from the selection.

These would be nice-to-haves and not strictly necessary if it would significantly prolong the development of the brushing feature. Thanks!

Hvass-Labs commented 9 months ago

Bug: The Javascript-file you built for me a few weeks ago was based on v0.6.11 and doesn't work properly with crosshair and tip/pointer, as their overlays sometimes (but not always!) get stuck and remain visible when also using brushing / selection of the data.

EDIT: The crosshair and tip/pointer also causes problems with the value field on the Plot-object, which becomes empty so I cannot obtain the selected data. This also happens sometimes but not always.

hemanrobinson commented 7 months ago

On a previous project, I implemented Shift and Ctrl keys for brushing.

Following UI standards eases the user’s learning curve. When selecting individual objects, standard behaviors for Shift and Ctrl are well-defined. However, I found no UI standards for their behavior during brushing.

So I had the same idea as Hvass-Labs, to make the Shift key extend the selection, and the Ctrl key reduce it. This enables people to select irregular areas, or disjoint clusters of points. It gives the effect of “painting” the plot.

Try it, and see what you think :-)

https://observablehq.com/@heman/fast-brushing

mbostock commented 4 months ago

Here’s an example of brushing on x using a custom mark:

Plot.plot({
  marks: [
    Plot.ruleY([0]),
    Plot.lineY(data, {x: "date", y: "value"}),
    (index, scales, channels, dimensions, context) => {
      const x1 = dimensions.marginLeft;
      const x2 = dimensions.width - dimensions.marginRight;
      const y1 = 0;
      const y2 = dimensions.height;
      const brushed = (event) => setStartEnd(event.selection?.map(scales.x.invert));
      const brush = d3.brushX().extent([[x1, y1], [x2, y2]]).on("brush end", brushed);
      return d3.create("svg:g").call(brush).node();
    }
  ]
})

The setStartEnd function is defined using Framework’s Mutable:

const startEnd = Mutable(null);
const setStartEnd = (se) => startEnd.value = se;
brainbytes42 commented 3 months ago

A lasso selection would be very useful - seems related to this topic. For me this is crucial, as I need to select data from a scatter plot to export only the selected data.

Example from plotly: https://plotly.com/javascript/lasso-selection/

(I implemented this behaviour for a Java-Application, which I want to migrate into a web app, so I need to know, if this would be somehow possible.) Thank you

Fil commented 3 months ago

We prototyped a lasso in https://github.com/observablehq/plot/pull/730; we'll get back to it together with brushing (see also https://github.com/observablehq/plot/pull/721).