Open ahuang11 opened 9 months ago
I was a little surprised, the caching on the right_plot
is working just fine and since the plot only sees the rasterized data it should never attempt to recompute the actual data. So what I think is happening is that datashader is applying the regridding lazily so whenever anything on the right_plot
is recomputed it has to go all the way back to the raw underlying data, apply the reduction and then apply the regridding.
So based on that everything is working correctly here, so I think the only real fix to apply here is to warn users that if they pass a non-persisted Dask backed array to rasterize/regrid then the interactive performance will be significantly degraded. Alternatively rasterize
should automatically persist the regridded result.
Some timings, indicating it's pretty much entirely range calculations.
.persist()
685761 function calls (681648 primitive calls) in 2.409 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 2.409 2.409 {built-in method builtins.exec}
1 0.000 0.000 2.409 2.409 <string>:1(<module>)
365/1 0.000 0.000 2.409 2.409 parameterized.py:518(_f)
7/1 0.000 0.000 2.409 2.409 parameters.py:533(__set__)
335/1 0.001 0.000 2.409 2.409 parameterized.py:1443(__set__)
3 0.000 0.000 2.409 0.803 parameterized.py:2473(_call_watcher)
3 0.000 0.000 2.409 0.803 parameterized.py:2456(_execute_watcher)
2 0.000 0.000 2.408 1.204 streams.py:774(_watcher)
2 0.000 0.000 2.408 1.204 streams.py:149(trigger)
2 0.000 0.000 2.407 1.204 plot.py:210(refresh)
2 0.000 0.000 2.407 1.203 plot.py:252(_trigger_refresh)
2 0.000 0.000 2.407 1.203 plot.py:953(update)
2 0.000 0.000 2.407 1.203 plot.py:434(__getitem__)
16/14 0.000 0.000 2.371 0.169 __init__.py:186(pipelined_fn)
1 0.000 0.000 2.357 2.357 element.py:2995(update_frame)
3 0.000 0.000 2.354 0.785 base.py:605(compute)
2 0.000 0.000 2.351 1.176 plot.py:574(compute_ranges)
3 0.000 0.000 2.351 0.784 plot.py:692(_compute_group_range)
6 0.000 0.000 2.346 0.391 raster.py:496(range)
2 0.000 0.000 2.345 1.173 __init__.py:488(range)
2 0.000 0.000 2.345 1.172 xarray.py:287(range)
3 0.001 0.000 2.270 0.757 threaded.py:36(get)
3 0.014 0.005 2.269 0.756 local.py:350(get_async)
2078 0.001 0.000 2.125 0.001 local.py:136(queue_get)
2078 0.008 0.000 2.123 0.001 queue.py:154(get)
1210 0.006 0.000 2.107 0.002 threading.py:280(wait)
6918 2.098 0.000 2.098 0.000 {method 'acquire' of '_thread.lock' objects}
.persist()
586664 function calls (583555 primitive calls) in 0.272 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.272 0.272 {built-in method builtins.exec}
1 0.000 0.000 0.272 0.272 <string>:1(<module>)
365/1 0.000 0.000 0.272 0.272 parameterized.py:518(_f)
7/1 0.000 0.000 0.272 0.272 parameters.py:533(__set__)
335/1 0.002 0.000 0.272 0.272 parameterized.py:1443(__set__)
3 0.000 0.000 0.272 0.091 parameterized.py:2473(_call_watcher)
3 0.000 0.000 0.272 0.091 parameterized.py:2456(_execute_watcher)
2 0.000 0.000 0.270 0.135 streams.py:774(_watcher)
2 0.000 0.000 0.270 0.135 streams.py:149(trigger)
2 0.000 0.000 0.270 0.135 plot.py:210(refresh)
2 0.000 0.000 0.268 0.134 plot.py:252(_trigger_refresh)
2 0.000 0.000 0.268 0.134 plot.py:953(update)
2 0.000 0.000 0.268 0.134 plot.py:434(__getitem__)
16/14 0.000 0.000 0.229 0.016 __init__.py:186(pipelined_fn)
1 0.000 0.000 0.215 0.215 element.py:2995(update_frame)
2 0.000 0.000 0.209 0.105 plot.py:574(compute_ranges)
3 0.000 0.000 0.209 0.070 plot.py:692(_compute_group_range)
6 0.000 0.000 0.202 0.034 raster.py:496(range)
3 0.000 0.000 0.202 0.067 base.py:605(compute)
2 0.000 0.000 0.201 0.101 __init__.py:488(range)
2 0.000 0.000 0.201 0.100 xarray.py:287(range)
3 0.001 0.000 0.153 0.051 threaded.py:36(get)
3 0.005 0.002 0.152 0.051 local.py:350(get_async)
2070 0.001 0.000 0.075 0.000 local.py:136(queue_get)
2070 0.003 0.000 0.074 0.000 queue.py:154(get)
260 0.001 0.000 0.068 0.000 threading.py:280(wait)
3121 0.067 0.000 0.067 0.000 {method 'acquire' of '_thread.lock' objects}
Alternatively rasterize should automatically persist the regridded result.
I think this would be ideal since
if they pass a non-persisted Dask backed array to rasterize/regrid then the interactive performance will be significantly degraded
Is not possible in some cases if the data is much larger
Alternatively rasterize should automatically persist the regridded result.
Unfortunately I found this to effectively make zero difference.
@philippjfr, is there a potential solution for a larger non-persisted Dask-backed array to make use of rasterize in this context?
If this is loaded, things move much quicker