leeoniya / uPlot

📈 A small, fast chart for time series, lines, areas, ohlc & bars
MIT License
8.51k stars 371 forks source link

Poor series focus performance with noisy data #787

Open flyingmutant opened 1 year ago

flyingmutant commented 1 year ago

Following https://github.com/VKCOM/statshouse/issues/134, as promised.

Note that we use a hack/workaround of setting focus.alpha = 1.1, this way the plot is not redrawn every time focus changes (which can happen every frame for dense data, and will tank the performance completely). Instead, we simply highligh the focused series in the legend.

Here is a self-contained file that reproduces the problem: uplot-focus-perf.zip, using real production data. Hovering over the plot is janky and slow with series focus enabled, and smooth otherwise.

leeoniya commented 1 year ago

thanks for the info. currently on phone but if your attached data looks like the screenshot in the linked issue, then i wonder if it makes sense to offer some kind of median, geomean, or custom fn mode for focus that can do some reduce op on surrounding points of each series and then use that smoothed value to determine cursor proximity. that should be much more stable and reduce redraw bouncing while still maintaining focus highlight.

flyingmutant commented 1 year ago

Some kind of fast approximate focus mode sounds great. Here is how the data looks:

Screenshot from 2023-01-15 17-06-54

leeoniya commented 1 year ago

i didn't realize there were 100 series here, with most of them concentrated at the bottom 20% of the range. whatever we do to improve this, it will still lag badly when hovering anything below 25k, and may be also not great above 25k, since it will require averaging ~10pts of more on every mousemove for all 100 series.

at some point you need to do something else. e.g. turning a scatter plot with millions of points in an aggregated heatmap with 10k cells.

some kind of anomaly detection and smoothing would be good here. can you try doing this smoothing in advance? this way you won't have a null-filled ocean, and the series will be smoothed and and aligned enough for the default focus to work better. see ASAP here: https://leeoniya.github.io/uPlot/demos/data-smoothing.html, or you can do it server-side.

i'll keep this open just to experiment in the future, but i dont think this will solve this case adequately.

original ASAP code: http://www.futuredata.io.s3-website-us-west-2.amazonaws.com/asap/

leeoniya commented 1 year ago

horizon plots are better for things like this. they give you a dedicated, fixed height hover area for each series.

https://datavis.blog/2022/04/30/horizon-charts-in-tableau/ https://observablehq.com/@d3/horizon-chart https://bernatgel.github.io/karyoploter_tutorial/Tutorial/PlotHorizon/PlotHorizon.html

could be interesting to do something like this in a uPlot demo.

i have a y-shifted demo, which is close but not the same: https://leeoniya.github.io/uPlot/demos/y-shifted-series.html

flyingmutant commented 1 year ago

I am a fan of horizon plots too and am thinking about how we can incorporate them quite often :-)

The problem here is that our project provides a UI for interactive data exploration (in fact, this is its main mode of operation), where users control how much data is shown. We provide an ability to group by time series by tags, and to select how much of the aggregated ones to show (top 5 by default, but can be top 100 like in this example). When the plot looks and works great with top 5, but starts to be very janky with top 100, the user experience is not great, and right now in this case uPlot looks to be the bottleneck (backend or JSON decoding performance does not degrade this much). Dynamically switching the display based on the amount of data is unfortunately out of the question, as we want for user experience to be always the same, without drastic transition points.

flyingmutant commented 1 year ago

Also, I am personally quite agains data smoothing, and consider it to be an antipattern (at least as the default). We do select the aggregation interval based on the plot width (targeting to have 1 value every several pixels), however, so the screenshot above is probably near the worst case of data density. Most plots are much closer to this:

Screenshot from 2023-01-17 16-33-57

leeoniya commented 1 year ago

i dont have a lot of free time this month, so you might need to get your hands dirty and make a PR so we can test it out.

maybe add a setting for sampling surrounding values like value?: (u: uPlot, seriesIdx: number, dataIdx: number) => number here:

https://github.com/leeoniya/uPlot/blob/9b6888c8d99fdf9379891892710fb1c750f70002/dist/uPlot.d.ts#L513-L516

then update around here to invoke cursor.focus.value(), which will handle sampling/reducing the surrounding points to the value you want to use:

https://github.com/leeoniya/uPlot/blob/9b6888c8d99fdf9379891892710fb1c750f70002/src/uPlot.js#L2485-L2494

leeoniya commented 1 year ago

i added you as a collaborator so you can create new branches that we can both push to. make something like flyingmutant/cursor-focus-sampling

flyingmutant commented 1 year ago

Thanks! Unfortunately can't promise right now when I'll be able to find time to dig into this.

flyingmutant commented 1 year ago

I've made a bit of progress:

Screenshot from 2023-02-02 12-25-29

I am able to reproduce the problem with very little data: only 3 series, 1 of which is very sparse. I am not very proficient with Chrome's profiler, but it shows that almost all the time is spent in "System", with no way to dig into it (?):

Screenshot from 2023-02-02 12-27-28

I've tried the Firefox profiler and it so so much more helpful:

Screenshot from 2023-02-02 12-37-10

Screenshot from 2023-02-02 12-38-05

Screenshot from 2023-02-02 12-38-10

Looks like a lot of slow redraw is happening, for some reason.

leeoniya commented 1 year ago

it's not a question of how many series, but how many series are overlapping each other, and how much noise and nulls there are. i said this in my first comment. the focus can change on every mousemove event in noisy overlapping data because it is based on closest datapoint to cursor position. redrawing hundreds of times per second is always going to suck.

take a look at the last commit in the cursor.focus.value-2 branch. it tries to do a local 20-point average to stabilize the focus proximity. as i expected, it helps, but does not solve this in cases when the series are too densly packed or too noisy, so their local averages alternate.

the question was never why or where there was a problem, but how you expect a solution to work with such data. do you have an example in another charting library that works as you expect with the same data?

flyingmutant commented 1 year ago

Can we avoid the redraw completely for cases where we don't want focused and de-focused series to be styled differently, and only want to update the legend and the overlay with the crosshair and highlighted points closest to the cursor? I think that would be a great compromise between speed and functionality.

leeoniya commented 1 year ago

i'll look into adding selective alpha for for series, legend, and hover points. something like cursor.focus.alpha: [seriesAlpha, legendAlpha, cursorPtsAlpha] (when hovering plotting area) and legend.focus.alpha: [seriesAlpha, legendAlpha, cursorPtsAlpha] (when hovering legend)

Minardil commented 3 months ago

it's not a question of how many series, but how many series are overlapping each other, and how much noise and nulls there are. i said this in my first comment. the focus can change on every mousemove event in noisy overlapping data because it is based on closest datapoint to cursor position. redrawing hundreds of times per second is always going to suck.

take a look at the last commit in the cursor.focus.value-2 branch. it tries to do a local 20-point average to stabilize the focus proximity. as i expected, it helps, but does not solve this in cases when the series are too densly packed or too noisy, so their local averages alternate.

the question was never why or where there was a problem, but how you expect a solution to work with such data. do you have an example in another charting library that works as you expect with the same data?

We have such proprietary library which I want to get rid of. uplot have the closest performance but still have two bottlenecks: focus and resizing. We are doing focus on different canvas