holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.71k stars 403 forks source link

Unrelated updates in overlay with dmap triggers computation of another dmap #6135

Open ahuang11 opened 9 months ago

ahuang11 commented 9 months ago
import xarray as xr
import panel as pn
import dask.array as da

pn.extension(throttled=True)
import holoviews as hv
from holoviews.operation.datashader import rasterize
from holoviews.streams import RangeXY

hv.extension("bokeh")

DATA_ARRAY = "10000frames"

# create fake dask array
data = da.random.random((100000, 100, 100), chunks=(100, 100, 100))
data = xr.DataArray(data, dims=["frame", "height", "width"])

FRAMES_PER_SECOND = 30
FRAMES = data.coords["frame"].values

def plot_image(value):
    return hv.Image(data.sel(frame=value), kdims=["width", "height"]).opts(
        cmap="Viridis",
        frame_height=400,
        frame_width=400,
        colorbar=False,
    )

# Create a video player widget
video_player = pn.widgets.Player(
    length=len(data.coords["frame"]),
    interval=1000 // FRAMES_PER_SECOND,  # ms
    value=int(FRAMES.min()),
    max_width=400,
    max_height=90,
    loop_policy="loop",
    sizing_mode="stretch_width",
)

# Create the main plot
main_plot = hv.DynamicMap(
    plot_image, kdims=["value"], streams=[video_player.param.value]
)

# frame indicator lines on side plots
line_opts = dict(color="red", alpha=0.6, line_width=3)
dmap_vline = hv.DynamicMap(
    pn.bind(lambda value: hv.VLine(value), video_player)
).opts(
    **line_opts
)

# height side view
right_data = data.mean(["width"])
right_plot = rasterize(
    hv.Image(right_data, kdims=["frame", "height"]).opts(
        cmap="Viridis",
        frame_height=400,
        frame_width=200,
        colorbar=False,
        title="_",
    ),
    streams=[RangeXY()],
)

sim_app = pn.Column(
    video_player,
    pn.Row(main_plot, right_plot * dmap_vline),
)

sim_app

If this is loaded, things move much quicker


right_data = data.mean(["width"]).load()
philippjfr commented 9 months ago

I was a little surprised, the caching on the right_plot is working just fine and since the plot only sees the rasterized data it should never attempt to recompute the actual data. So what I think is happening is that datashader is applying the regridding lazily so whenever anything on the right_plot is recomputed it has to go all the way back to the raw underlying data, apply the reduction and then apply the regridding.

So based on that everything is working correctly here, so I think the only real fix to apply here is to warn users that if they pass a non-persisted Dask backed array to rasterize/regrid then the interactive performance will be significantly degraded. Alternatively rasterize should automatically persist the regridded result.

ahuang11 commented 9 months ago

Maybe related https://discourse.holoviz.org/t/why-panel-is-10x-slower-with-dask-than-with-pandas/6767

philippjfr commented 9 months ago

Some timings, indicating it's pretty much entirely range calculations.

No .persist()

         685761 function calls (681648 primitive calls) in 2.409 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.409    2.409 {built-in method builtins.exec}
        1    0.000    0.000    2.409    2.409 <string>:1(<module>)
    365/1    0.000    0.000    2.409    2.409 parameterized.py:518(_f)
      7/1    0.000    0.000    2.409    2.409 parameters.py:533(__set__)
    335/1    0.001    0.000    2.409    2.409 parameterized.py:1443(__set__)
        3    0.000    0.000    2.409    0.803 parameterized.py:2473(_call_watcher)
        3    0.000    0.000    2.409    0.803 parameterized.py:2456(_execute_watcher)
        2    0.000    0.000    2.408    1.204 streams.py:774(_watcher)
        2    0.000    0.000    2.408    1.204 streams.py:149(trigger)
        2    0.000    0.000    2.407    1.204 plot.py:210(refresh)
        2    0.000    0.000    2.407    1.203 plot.py:252(_trigger_refresh)
        2    0.000    0.000    2.407    1.203 plot.py:953(update)
        2    0.000    0.000    2.407    1.203 plot.py:434(__getitem__)
    16/14    0.000    0.000    2.371    0.169 __init__.py:186(pipelined_fn)
        1    0.000    0.000    2.357    2.357 element.py:2995(update_frame)
        3    0.000    0.000    2.354    0.785 base.py:605(compute)
        2    0.000    0.000    2.351    1.176 plot.py:574(compute_ranges)
        3    0.000    0.000    2.351    0.784 plot.py:692(_compute_group_range)
        6    0.000    0.000    2.346    0.391 raster.py:496(range)
        2    0.000    0.000    2.345    1.173 __init__.py:488(range)
        2    0.000    0.000    2.345    1.172 xarray.py:287(range)
        3    0.001    0.000    2.270    0.757 threaded.py:36(get)
        3    0.014    0.005    2.269    0.756 local.py:350(get_async)
     2078    0.001    0.000    2.125    0.001 local.py:136(queue_get)
     2078    0.008    0.000    2.123    0.001 queue.py:154(get)
     1210    0.006    0.000    2.107    0.002 threading.py:280(wait)
     6918    2.098    0.000    2.098    0.000 {method 'acquire' of '_thread.lock' objects}

With .persist()

         586664 function calls (583555 primitive calls) in 0.272 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.272    0.272 {built-in method builtins.exec}
        1    0.000    0.000    0.272    0.272 <string>:1(<module>)
    365/1    0.000    0.000    0.272    0.272 parameterized.py:518(_f)
      7/1    0.000    0.000    0.272    0.272 parameters.py:533(__set__)
    335/1    0.002    0.000    0.272    0.272 parameterized.py:1443(__set__)
        3    0.000    0.000    0.272    0.091 parameterized.py:2473(_call_watcher)
        3    0.000    0.000    0.272    0.091 parameterized.py:2456(_execute_watcher)
        2    0.000    0.000    0.270    0.135 streams.py:774(_watcher)
        2    0.000    0.000    0.270    0.135 streams.py:149(trigger)
        2    0.000    0.000    0.270    0.135 plot.py:210(refresh)
        2    0.000    0.000    0.268    0.134 plot.py:252(_trigger_refresh)
        2    0.000    0.000    0.268    0.134 plot.py:953(update)
        2    0.000    0.000    0.268    0.134 plot.py:434(__getitem__)
    16/14    0.000    0.000    0.229    0.016 __init__.py:186(pipelined_fn)
        1    0.000    0.000    0.215    0.215 element.py:2995(update_frame)
        2    0.000    0.000    0.209    0.105 plot.py:574(compute_ranges)
        3    0.000    0.000    0.209    0.070 plot.py:692(_compute_group_range)
        6    0.000    0.000    0.202    0.034 raster.py:496(range)
        3    0.000    0.000    0.202    0.067 base.py:605(compute)
        2    0.000    0.000    0.201    0.101 __init__.py:488(range)
        2    0.000    0.000    0.201    0.100 xarray.py:287(range)
        3    0.001    0.000    0.153    0.051 threaded.py:36(get)
        3    0.005    0.002    0.152    0.051 local.py:350(get_async)
     2070    0.001    0.000    0.075    0.000 local.py:136(queue_get)
     2070    0.003    0.000    0.074    0.000 queue.py:154(get)
      260    0.001    0.000    0.068    0.000 threading.py:280(wait)
     3121    0.067    0.000    0.067    0.000 {method 'acquire' of '_thread.lock' objects}
ahuang11 commented 9 months ago

Alternatively rasterize should automatically persist the regridded result.

I think this would be ideal since

if they pass a non-persisted Dask backed array to rasterize/regrid then the interactive performance will be significantly degraded

Is not possible in some cases if the data is much larger

philippjfr commented 9 months ago

Alternatively rasterize should automatically persist the regridded result.

Unfortunately I found this to effectively make zero difference.

droumis commented 5 months ago

@philippjfr, is there a potential solution for a larger non-persisted Dask-backed array to make use of rasterize in this context?