Scattergl plot is sometimes slow to interact with even with moderate number of points

bruot commented 2 years ago

With constant number of points and other settings, I see that Scattergl rendering times vary a lot. With 100,000 points, there are cases where it is virtually impossible to interact with the plot (e.g. zoom), and just moving the mouse seem to cause lengthy calculations.

For example, the following code from the doc works well:

import plotly.graph_objects as go
import numpy as np

N = 100000
fig = go.Figure(data=go.Scattergl(
    x = np.random.randn(N),
    y = np.random.randn(N),
    mode='markers',
    marker=dict(
        color=np.random.randn(N),
        colorscale='Viridis',
        line_width=1
    )
))

fig.show()

but if I replace the x data by zeros,

    x = np.zeros(N),

performance is significantly degraded (so much that using Scatter instead of Scattergl becomes better when trying to zoom in a box).

Any ideas where the difference is coming from and how performance could be improved in this case?

Thanks.

bruot commented 2 years ago

With this x data, things are equivalently slow:

x = np.random.randn(N)
x[0] = 100

Related: #5881. But the problem is not limited to zeros.

CmpCtrl commented 2 years ago

Just to add to this, the issue is related to the hover events and is sensitive to the distribution of the data. Randomly distributed data runs very quickly, but real world data tends to be extremely slow. Artificial data like the linear plot in the example below is also slow, as is random data that has an outlier like @bruot's example above.

I came across the issue using Dash, which is new to me, so please excuse my poor diagnostics. Hopefully this can help point someone in the right direction.

I made a sample python script to demonstrate the issue in Dash.

from dash import Dash, callback, html, dcc, Input, Output
import numpy as np

app = Dash(__name__)

layout = html.Div([
    html.H1('Plotly Scatter Perf Demo'),

    html.Div([
        dcc.Input(value='100000',id='nSamples',type='number'),
        html.Div([
            dcc.Graph(id='plot1'),
            dcc.Graph(id='plot2'),
        ],id='row')
    ],id='plot')
])

app.layout = layout

@callback([Output('plot1','figure'),Output('plot2','figure')],
    Input('nSamples','value'))
def doPlots(nSamples):
    fig = [{},{}]
    y = np.random.random(int(nSamples))*5000
    y2 = np.linspace(0,5000,int(nSamples))

    trace = {'type':'scattergl','y':y,\
        'x0':0,'dx':1,'name':'Random','mode':'markers'}
    layout = {'title':{'text':f'Random Data n:{nSamples}'}}
    fig[0] = dict(data=[trace],layout=layout)
    trace = {'type':'scattergl','y':y2,\
        'x0':0,'dx':1,'name':'Linear','mode':'markers'}
    layout = {'title':{'text':f'linear Data n:{nSamples}'}}
    fig[1] = dict(data=[trace],layout=layout)
    return fig
if __name__ == '__main__':
    app.run_server(debug=True)

In the image below I used Chrome's performance profiling tool. The first 3seconds or so seconds was hovering around over the Random Data plot, that responds fine. Then at about 3s I moved the mouse to hover over the linear data plot where it appears to hang for about 6 seconds before the hover label appears, then around another full second before the hover label updates after a move.

It appears to spend most of its time in this L function in the minified async-plotlyjs.js

I think this is also related to #1698

Hopefully this helps someone who understands this a lot better than myself diagnose to the issue.

CmpCtrl commented 2 years ago

Clearly I shouldn't rely on google to effectively search the issues list... #5790 is also related.

Entropy5 commented 2 years ago

I have the same problem 250k points of my own dataset lags more than 1M random points from their example

jvdd commented 2 years ago

It might be interesting to consider plotly-resampler when you want to visualize large datasets.

This extension adds resampling functionality to Plotly figures (by running an under the hood dash app), allowing to visualize tons of datapoints while being responsive.

siiruli commented 11 months ago

I isolated the issue to commit https://github.com/plotly/plotly.js/commit/907415a84c9dc656eb3f6fa9725fd8e9e3910b89, which changes the default value of spikedistance from 20 to -1. This means there is now no cutoff for searching data for spikelines.

Setting spikedistance: 0 fixed the problem for me. This is weird because my app was not drawing any spikelines, so it shouldn't need to calculate anything related to them. It looks like the code finds the points first and then filters out everything that doesn't allow spikelines. https://github.com/plotly/plotly.js/blob/a5577d994ea06785be100f9e7decff3e6cd8ab1f/src/components/fx/hover.js#L569-L589

Setting hovermode: 'x' helped slightly but not enough, and disabling hovering also fixed the issue, but hoverlabels were necessary in my application.

I'm pretty sure that issues https://github.com/plotly/plotly.js/issues/6792, https://github.com/plotly/plotly.js/issues/5881, https://github.com/plotly/plotly.js/issues/6054, and maybe https://github.com/plotly/plotly.js/issues/5790 are also caused by this. The change was added in plotly.js version v2.0.0-rc.2 and affects dash versions 1.21.0 and later.

CmpCtrl commented 10 months ago

Cool, glad to hear about some progress on this issue. I haven't had a chance to work with it recently (and won't in the near term) but i'm excited to look into it further. Thanks.

CmpCtrl commented 1 month ago

Another year later, and i'm surprised this isn't a bigger issue for more users. I took a quick look at issues and one more that i think is related is #6174.

@siiruli's find of the default spike distance seems to fix the issue for me. Setting the default back to 20pixels works great. Does anyone know why that shouldn't be reverted?

plotly / plotly.js

Scattergl plot is sometimes slow to interact with even with moderate number of points #5927