holoviz / geoviews

Simple, concise geographical visualization in Python
http://geoviews.org
BSD 3-Clause "New" or "Revised" License
587 stars 75 forks source link

Speeding up projected maps in geoviews? #230

Closed rsignell-usgs closed 5 years ago

rsignell-usgs commented 5 years ago

I'm visualizing public Landsat data with hvplot.xarray but I think this is likely a geoviews issue.

It takes about 3 seconds to render an image when not using the crs: 2018-09-19_17-11-28

It takes about 3 minutes to render an image when using the crs: 2018-09-19_17-12-26

Without crs it's using my dask cluster, but with crs it does not use the cluster.

This is the code I'm using:

import rasterio 
import xarray as xr
import hvplot.xarray
from dask.distributed import Client, progress
import cartopy.crs as ccrs

client = Client()

image_url = 'https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/047/027/LC08_L1TP_047027_20130421_20170310_01_T1/LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF'

da = xr.open_rasterio(image_url, chunks={'band': 1, 'x': 2048, 'y': 2048})

band1 = da.sel(band=1).persist()

display(band1.hvplot(rasterize=True, width=600, height=400, cmap='viridis'))

crs = ccrs.UTM(zone='10n')
display(band1.hvplot(crs=crs, rasterize=True, width=600, height=400, cmap='viridis'))

Here is the notebook.

Is this expected?

Is there anything I can do to speed up visualization with crs specified?

rsignell-usgs commented 5 years ago

I heard offline from @dharhas that this is a Cartopy issue. @pelson, is there any way I could speed this up?

philippjfr commented 5 years ago

Apologies for not replying sooner, I do think this is down to cartopy but it's also possible that I'm using a suboptimal codepath. To project images we are currently calling cartopy.img_transform.warp_array, which I think is the bottleneck here.

jbednar commented 5 years ago

Can the data in the image be projected ahead of time, before displaying it?

philippjfr commented 5 years ago

Yes, gv.project will do that for you but that's still 3 minutes and therefore way too slow.

rsignell-usgs commented 5 years ago

So I'm guessing gv.project does not use Dask either, right?

philippjfr commented 5 years ago

That's right, it mostly just calls the appropriate cartopy transform depending on the input type. I've played around parallelizing this with dask with very mixed results.

rsignell-usgs commented 5 years ago

Just as an experiment, I tried downloading the tif image locally and using gdalwarp to warp to a regular lon/lat grid. It took 2 seconds to download the image and 3 seconds to warp onto a regular lon/lat grid, so it seems that there must be something very inefficient going on to make:

crs = ccrs.UTM(zone='10n')
display(band1.hvplot(crs=crs, rasterize=True, width=600, height=400, cmap='viridis'))

take 3 minutes.

Here's the gdalwarp procedure if you want to recreate:

jovyan@jupyter-rsignell-2dusgs:~/Landsat$ time wget https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/047/027/LC08_L1TP_047027_20130421_20170310_01_T1/LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF

LC08_L1TP_047027_20130421 100%[=====================================>]  77.16M  37.4MB/s    in 2.1s

2018-11-02 13:09:55 (37.4 MB/s) - ‘LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF’ saved [80907084/80907084]

real    0m2.217s
user    0m0.096s
sys     0m0.312s
jovyan@jupyter-rsignell-2dusgs:~/Landsat$ time gdalwarp -t_srs EPSG:4326 LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF lonlat.tif
Creating output file that is 9183P x 5981L.
Processing input file LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF.
0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m3.102s
user    0m2.852s
sys     0m0.248s

Even if I try the more time consuming -r regrid options in gdalwarp (the default is nearest), I can't get anything to take longer than 15 seconds:

jovyan@jupyter-rsignell-2dusgs:~/Landsat$ time gdalwarp -t_srs EPSG:4326 -r lanczos LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF lonlat.tif
Creating output file that is 9183P x 5981L.
Processing input file LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF.
0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m15.343s
user    0m15.016s
sys     0m0.324s

@pelson or @scottyhq, who would have insight here?

philippjfr commented 5 years ago

Thanks, for your experiments, I'll play around with this now. The equivalent operation using geoviews which ends up making a call to cartopy.img_transform.warp_array is this:

import geoviews as gv
import cartopy.crs as ccrs
img = gv.load_tiff('/Users/philippjfr/Downloads/LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF')
projected = gv.project(img, projection=ccrs.PlateCarree())
philippjfr commented 5 years ago

Here's the output of profiling the project operation with %%prun:

         15066 function calls (14817 primitive calls) in 242.420 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  242.420  242.420 {built-in method builtins.exec}
        1    0.000    0.000  242.420  242.420 <string>:2(<module>)
        1    0.000    0.000  242.420  242.420 parameterized.py:2416(__new__)
      2/1    0.000    0.000  242.419  242.419 operation.py:146(__call__)
      2/1    0.000    0.000  242.419  242.419 operation.py:113(_apply)
        1    0.000    0.000  242.419  242.419 projection.py:396(_process)
        6    0.000    0.000  242.416   40.403 dimension.py:713(map)
        1    0.143    0.143  242.414  242.414 projection.py:292(_process)
        1    1.119    1.119  242.132  242.132 img_transform.py:122(warp_array)
        1   21.111   21.111  240.060  240.060 img_transform.py:220(regrid)
        1  115.320  115.320  115.320  115.320 {method 'query' of 'scipy.spatial.ckdtree.cKDTree' objects}
        4   98.791   24.698   98.791   24.698 {method 'transform_points' of 'cartopy._crs.CRS' objects}
rsignell-usgs commented 5 years ago

Wow @philippjfr , looks like we have some room for improvement! Great! cc @josef-ebd

rsignell-usgs commented 5 years ago

@philippjfr this Cartopy PR from @QuLogic looks promising!

philippjfr commented 5 years ago

Definitely some solid improvement there, but not enough to make it very usable:

Using scipy's kdtree:

        1  115.320  115.320  115.320  115.320 {method 'query' of 'scipy.spatial.ckdtree.cKDTree' objects}

using pykdtree:

        1   53.287   53.287   53.287   53.287 {method 'query' of 'pykdtree.kdtree.KDTree' objects}
philippjfr commented 5 years ago

Actually, after calling rasterize it's pretty usable, no reason to project the whole image afterall.

philippjfr commented 5 years ago

Okay, I'm now wondering whether hvplot is doing something quite suboptimal, using geoviews directly things are pretty quick:

import geoviews as gv
from holoviews.operation.datashader import rasterize
gv.extension('bokeh')

tiff = gv.load_tiff('/Users/philippjfr/Downloads/LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF')
rasterize(tiff)

It seems like hvplot is projecting before rasterizing, which would explain the slow speed.

philippjfr commented 5 years ago

Okay to summarize, the main issue here was that hvplot always projects all the data before it applies rasterize/datashade options. This makes sense in some cases because it means the data does not have to be reprojected every time you zoom or pan, but it can also result in a huge initial overhead like this case. My suggestion is to add an explicit project=True/False flag to hvplot and disable to automatic projection.

The secondary problem is that projecting images using cartopy is pretty slow, which is substantially improved using pykdtree instead of scipy's kdtree.

The timings now (with and without the new pykdtree implementation in cartopy) are as follows:

  1. Projecting the whole image ahead of time with pykdtree: initialization time ~200 seconds, time to rerender when zooming/panning ~180 ms
  2. Projecting the whole image ahead of time with scipy kdtree: initialization time ~240 seconds, time to rerender when zooming/panning ~180 ms
  3. Projecting after regridding with pykdtree: initialization time ~1 second, time to rerender when zooming/panning ~1 second
  4. Projecting after regridding with scipy kdtree: initialization time ~1.8 second, time to rerender when zooming/panning ~1.8 second

Once I've added the project=True/False option to hvplot the user can choose the tradeoff between a very long initialization time and faster zooming/panning.

philippjfr commented 5 years ago

I've opened https://github.com/pyviz/hvplot/pull/106 to add the project=True/False argument to hvplot.

rsignell-usgs commented 5 years ago

So roughly a 100 fold speedup? I think we will notice that! 😜 Fantastic!

rsignell-usgs commented 5 years ago

Here's my example with the latest release, just to show it's working in seconds now!

2018-11-15_16-43-35