Add support for crs on holoviews datashader operations

philippjfr commented 7 years ago

Datashader is very useful when working with geospatial data and holoviews now offers datashader operations. One major issue is that these operations do not currently handle a crs, which makes it difficult to work with them. The main obstacle here is that a) there is no geoviews RGB type, which could be used to hold the output of datashade and b) that projecting on every datashader plot update is very inefficient. I'd therefore suggest a temporary approach to make this work is to offer an operation that projects any Element type to a new crs and secondly to add a crs-aware RGB type to geoviews.

jbednar commented 7 years ago

Are you suggesting to project after aggregation? I suppose that would be very quick to compute, and therefore seems definitely worth supporting, but presumably there would be issues with uneven bin sizes for at least some projections?

philippjfr commented 7 years ago

Are you suggesting to project after aggregation?

No, I'm suggesting you should always project explicitly before aggregation.

jbednar commented 7 years ago

Ah, so not projecting RGB, but Paths or Points? But if it supports any Element type, presumably it could be used to project the resulting RGB as well, just with less good results?

philippjfr commented 7 years ago

presumably it could be used to project the resulting RGB as well, just with less good results?

Projecting already gridded data is harder, but I guess that could be supported. I think the cleanest way to implement projecting of gridded data is to treat each pixel as an individual sample, then project the coordinates and then re-aggregate those samples with datashader.

philippjfr commented 7 years ago

That said I really don't know whether projecting RGB types is even possible, how would the color interpolation happen? Would reaggregating each channel make sense?

jbednar commented 7 years ago

Well, you said any Element, so I thought you had a plan. :-) You're right; for RGB it makes very little sense, and indeed would need some sort of weighted color interpolation. That's a case I would vote not to support.

But for Image, you get the same savings (i.e., a quick and dirty approximation) without the color issues, so it's a similar situation.

With enough resolution in the most distorted regions of the map, I think re-aggregating Image should work reasonably well. But don't you have non-datashader ways to re-aggregate? The total amount of data won't be large at this point, and re-aggregating a grid is something useful with or without datashader.

philippjfr commented 7 years ago

With enough resolution in the most distorted regions of the map, I think re-aggregating Image should work reasonably well. But don't you have non-datashader ways to re-aggregate? The total amount of data won't be large at this point, and re-aggregating a grid is something useful with or without datashader.

Yes, I'd like there to be but I've not found a good solution that doesn't rely on some other library. I still don't quite understand how cartopy does it, and my current ProjectImage operation already in geoviews is really quite poor imo, but it may be the case that's basically equivalent to the datashader based approach I'm proposing.

jbednar commented 7 years ago

I don't think the different possible implementations will vary in quality by much, only in speed, so I'd vote for just using your current implementation unless we find it's very slow. Rasterio offers regridding, and if you're going to rely on an external library I'd use that, as it's what datashader uses anyway.

niallrobinson commented 6 years ago

Hi guys - has there been any update on this stuff? We're just embarking on a big Notebook + Geoviews type project for our forecasters and it would be great if we could use datashader like this

philippjfr commented 6 years ago

Not yet, that said it wouldn't be all that much effort, so I'd be happy to prioritize this. It's not been a huge priority so far because usually we simply project the data before datashading, which means the resulting Image or RGB doesn't have to be projected again.

philippjfr commented 6 years ago

To expand a little bit, you can currently use the gv.operation.project operation to project data from a source crs to a target crs before applying the datashade operation. The actual issue is that the output of the datashade operation will be a HoloViews Image/RGB element which does not know anything about coordinate systems (and therefore cannot be automatically projected when you try to plot it on a different projection than it is already using). It should be easy enough to extend datashade to find the crs of the input and use that to construct a GeoViews Image/RGB on the output. It's also worth noting that this limitation only applies when you are datashading Points or Paths, the regrid operation which regrids Image/RGBs does retain the crs information.

niallrobinson commented 6 years ago

I see - I'd slightly misunderstood - thanks for the clarification. So it is possible to plot geospatial stuff with data-shader, just the API isn't particularly natural at the minute.

We're trying to plot very high resolution geospatial data (Iris Cube) using geoviews and its taking a long time. Personally I'd love a datashader=True kwarg in image plots, but perhaps that's a bit simplistic.

jbednar commented 6 years ago

Oh yes, Datashader is definitely the way to go with geospatial stuff in GeoViews. We desperately need to get some time to improve our documentation to make that clear. That's a project already underway, but not yet delivering.

The API isn't too far from what you suggest already: as of the current HoloViews, GeoViews, and Datashader releases, to use datashader to aggregate your HoloViews or GeoViews object obj into a grid, you either do aggregate(obj) (for non-gridded objs like Points, Curve, Graph and containers thereof), or regrid(obj) (for already-gridded objs like Image, and containers thereof). If you want datashader to aggregate and shade (colormap) the object, you'd do shade(aggregate(obj)) or the shortcut datashade(obj) (for non-gridded objs), or shade(regrid(obj)) (for gridded objs). In each case, these are dynamic operations that will automatically re-run datashader on the data before it is sent to the browser, thereby ensuring that all calculations are done quickly in Python and Numba, not slowly in the browser. So they basically work like datashade==True, but without having to hack in datashader support to the existing object types directly.

Having regrid be separate from the other datashader operations is just a historical artifact, and we are planning to unify the operators so that you don't have to worry about the distinction between already-gridded and not-yet-gridded data. Before we do that we need to unify some of the underlying syntax between the options accepted by regrid() and aggregate(); again those differ for historical rather than deep reasons. But the current API isn't too bad, just not as clean as I would like.

BTW, here we're only talking about regular grids, but note that we've just added support in Datashader for irregular grids (triangular meshes) , to be merged soon, in case those come up in any of your workflows. Again, I'd like that to be unified with the existing datashader operations in HoloViews and GeoViews, so that you just call those same operations for irregular-mesh data as well.

philippjfr commented 6 years ago

Thanks for the detailed summary @jbednar, just to be clear plotting very high resolution iris cubes is already possible if you do something like this (adjust for the lat/lon dimensions of your data):

from holoviews.operation.datashader import regrid
ds = gv.Dataset(iris_cube)
regrid(ds.to(gv.Image, ['Longitude', 'Latitude']))

The bottleneck in this approach will be projecting the image, which relies on cartopy's image projecting routines. If you don't need that you can also just use hv.Image in the code above, which should then be very fast indeed.

niallrobinson commented 6 years ago

Thanks guys - I was using shade rather than regrid - I'm getting something that looks more sensible now! Having this working on irregular grids is very cool - it's not something I'd be using right now, but it always comes up sooner or later.

I don't suppose there is anyway to make this cache the plots is there? Using datashader plots the data faster initially but is doesn't seem to cache the plots, making my nice geoviews slider laggy.

btw I appreciate this isn't the datashader+geoviews help forum so just point me somewhere else if you'd rather I dind't pollute your thread.

AlexHilson commented 6 years ago

Thanks a lot for the above information, it's really helpful.

When I use regrid with the matplotlib backend the above code works as I'd expect. But with the bokeh backend I get DataError: None of the available storage backends were able to support the supplied data format.

i.e. given the setup:

# <iris 'Cube' of air_temperature / (K) (latitude: 1200; longitude: 1600)>
kdims = ['longitude', 'latitude']
vdims = ['air_temperature']
ds = gv.Dataset(cube, kdims=kdims, vdims=vdims)
gv_img = ds.to(gv.Image, kdims)
hv_img = ds.to(hv.Image, kdims)

These will plot:

hv.notebook_extension('bokeh')
regrid(hv_img)

hv.notebook_extension('matplotlib')
regrid(gv_img)

But this throws the exception:

hv.notebook_extension('bokeh')
regrid(gv_img)

My understanding is that I can only add gf.coastline when using gv.Image, so this means that when using datashader I need to choose between an interactive bokeh plot and one with coastlines.

Does this seem correct? If there's anything I can do to help that'd be great, I'd love to have interactive + geospatially aware datashader plots.

philippjfr commented 6 years ago

When I use regrid with the matplotlib backend the above code works as I'd expect. But with the bokeh backend I get DataError: None of the available storage backends were able to support the supplied data format.

That's very odd indeed, the plotting backends should have no effect on the data.

My understanding is that I can only add gf.coastline when using gv.Image, so this means that when using datashader I need to choose between an interactive bokeh plot and one with coastlines.

That's sort of true but not quite, if you project your data to Mercator coordinates using gv.operation.project_image before you regrid you should be able to use gf.coastline just fine.

Is the file you're using something that you can share? I'd like to look into the DataError.

AlexHilson commented 6 years ago

Thanks, here's the slice of the dataset I'm using: https://s3-eu-west-1.amazonaws.com/geoviews-sample/air_temperature.nc. Can provide more if you want.

My understanding is that I can only add gf.coastline when using gv.Image, so this means that when using datashader I need to choose between an interactive bokeh plot and one with coastlines.

That's sort of true but not quite, if you project your data to Mercator coordinates using gv.operation.project_image before you regrid you should be able to use gf.coastline just fine.

Aha! This is excellent, thanks.

regrid(gv_img) fails as before, but regrid(new_img) works perfectly

digitaltopo commented 6 years ago

Having the same issue, matplotlib works but bokeh gives error:

1. Create an xarray from a geotiff

filename = '../path/to/file/name.tif`
xarray = xr.open_rasterio(filename).load()

xarray:

<xarray.DataArray (band: 4, y: 4569, x: 9087)>
array([[[0, 0, ..., 0, 0],
        [0, 0, ..., 0, 0],
        ..., 
        [0, 0, ..., 0, 0],
        [0, 0, ..., 0, 0]],

       [[0, 0, ..., 0, 0],
        [0, 0, ..., 0, 0],
        ..., 
        [0, 0, ..., 0, 0],
        [0, 0, ..., 0, 0]],

       [[0, 0, ..., 0, 0],
        [0, 0, ..., 0, 0],
        ..., 
        [0, 0, ..., 0, 0],
        [0, 0, ..., 0, 0]],

       [[0, 0, ..., 0, 0],
        [0, 0, ..., 0, 0],
        ..., 
        [0, 0, ..., 0, 0],
        [0, 0, ..., 0, 0]]], dtype=uint16)
Coordinates:
  * band     (band) int64 1 2 3 4
  * y        (y) float64 2.57e+06 2.57e+06 2.57e+06 2.57e+06 2.57e+06 ...
  * x        (x) float64 3.612e+05 3.612e+05 3.613e+05 3.613e+05 3.613e+05 ...
Attributes:
    crs:        +init=epsg:32617
    res:        (3.0, 3.0)
    is_tiled:   1
    transform:  (3.0, 0.0, 361245.0, 0.0, -3.0, 2569614.0, 0.0, 0.0, 1.0)

2. Create a gv.dataset:

gv_dataset = gv.Dataset(xarray, kdims=['band','x','y'], vdims=['Raster'] , crs=ccrs.UTM(17))

gv_dataset: :Dataset [band,x,y] (Raster)

3. Create a gv.image:

image = regrid(gv_dataset.to(gv.Image, kdims=['x','y']))

4. Render the image

Render the image with matplotlib:

hv.notebook_extension('matplotlib')
image

screenshot_2017-12-18_15-04-44

Render the image with bokeh:

hv.notebook_extension('bokeh')
image

DataError: None of the available storage backends were able to support the supplied data format.

Full error:

---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/IPython/core/formatters.py in __call__(self, obj)
    330                 pass
    331             else:
--> 332                 return printer(obj)
    333             # Finally look for special method names
    334             method = get_real_method(obj, self.print_method)

/usr/local/lib/python3.6/dist-packages/holoviews/ipython/display_hooks.py in pprint_display(obj)
    257     if not ip.display_formatter.formatters['text/plain'].pprint:
    258         return None
--> 259     return display(obj, raw=True)
    260 
    261 

/usr/local/lib/python3.6/dist-packages/holoviews/ipython/display_hooks.py in display(obj, raw, **kwargs)
    243     elif isinstance(obj, (HoloMap, DynamicMap)):
    244         with option_state(obj):
--> 245             html = map_display(obj)
    246     else:
    247         return repr(obj) if raw else IPython.display.display(obj, **kwargs)

/usr/local/lib/python3.6/dist-packages/holoviews/ipython/display_hooks.py in wrapped(element)
    130         try:
    131             html = fn(element,
--> 132                       max_frames=OutputSettings.options['max_frames'])
    133 
    134             # Only want to add to the archive for one display hook...

/usr/local/lib/python3.6/dist-packages/holoviews/ipython/display_hooks.py in map_display(vmap, max_frames)
    198         return None
    199 
--> 200     return render(vmap)
    201 
    202 

/usr/local/lib/python3.6/dist-packages/holoviews/ipython/display_hooks.py in render(obj, **kwargs)
     60     if renderer.fig == 'pdf':
     61         renderer = renderer.instance(fig='png')
---> 62     return renderer.html(obj, **kwargs)
     63 
     64 

/usr/local/lib/python3.6/dist-packages/holoviews/plotting/renderer.py in html(self, obj, fmt, css, comm, **kwargs)
    255         code to initialize a Comm, if the plot supplies one.
    256         """
--> 257         plot, fmt =  self._validate(obj, fmt)
    258         figdata, _ = self(plot, fmt, **kwargs)
    259         if css is None: css = self.css

/usr/local/lib/python3.6/dist-packages/holoviews/plotting/renderer.py in _validate(self, obj, fmt)
    191         if isinstance(obj, tuple(self.widgets.values())):
    192             return obj, 'html'
--> 193         plot = self.get_plot(obj, renderer=self)
    194 
    195         fig_formats = self.mode_formats['fig'][self.mode]

/usr/local/lib/python3.6/dist-packages/holoviews/plotting/bokeh/renderer.py in get_plot(self_or_cls, obj, doc, renderer)
    114         combining the bokeh model with another plot.
    115         """
--> 116         plot = super(BokehRenderer, self_or_cls).get_plot(obj, renderer)
    117         if self_or_cls.mode == 'server' and doc is None:
    118             doc = curdoc()

/usr/local/lib/python3.6/dist-packages/holoviews/plotting/renderer.py in get_plot(self_or_cls, obj, renderer)
    178             plot = self_or_cls.plotting_class(obj)(obj, renderer=renderer,
    179                                                    **plot_opts)
--> 180             plot.update(0)
    181         else:
    182             plot = obj

/usr/local/lib/python3.6/dist-packages/holoviews/plotting/plot.py in update(self, key)
    484         if len(self) == 1 and key == 0 and not self.drawn:
    485             return self.initialize_plot()
--> 486         item = self.__getitem__(key)
    487         self.traverse(lambda x: setattr(x, '_updated', True))
    488         return item

/usr/local/lib/python3.6/dist-packages/holoviews/plotting/plot.py in __getitem__(self, frame)
    219         if isinstance(frame, int) and frame > len(self):
    220             self.warning("Showing last frame available: %d" % len(self))
--> 221         if not self.drawn: self.handles['fig'] = self.initialize_plot()
    222         if not isinstance(frame, tuple):
    223             frame = self.keys[frame]

/usr/local/lib/python3.6/dist-packages/holoviews/plotting/bokeh/element.py in initialize_plot(self, ranges, plot, plots, source)
    795         self.handles['plot'] = plot
    796 
--> 797         self._init_glyphs(plot, element, ranges, source)
    798         if not self.overlaid:
    799             self._update_plot(key, plot, style_element)

/usr/local/lib/python3.6/dist-packages/holoviews/plotting/bokeh/element.py in _init_glyphs(self, plot, element, ranges, source)
    746         else:
    747             style = self.style[self.cyclic_index]
--> 748             data, mapping, style = self.get_data(element, ranges, style)
    749             current_id = element._plot_id
    750         if source is None:

/usr/local/lib/python3.6/dist-packages/geoviews/plotting/bokeh/plot.py in get_data(self, element, ranges, style)
     68     def get_data(self, element, ranges, style):
     69         if self._project_operation and self.geographic and element.crs != DEFAULT_PROJ:
---> 70             element = self._project_operation(element)
     71         return super(GeoPlot, self).get_data(element, ranges, style)
     72 

/usr/local/lib/python3.6/dist-packages/holoviews/core/operation.py in __call__(self, element, **params)
    139                                 operation=self, kwargs=params)
    140         elif isinstance(element, ViewableElement):
--> 141             processed = self._process(element)
    142         elif isinstance(element, DynamicMap):
    143             if any((not d.values) for d in element.kdims):

/usr/local/lib/python3.6/dist-packages/geoviews/operation.py in _process(self, img, key)
    147         bounds = (extents[0], extents[2], extents[1], extents[3])
    148         return img.clone(data, bounds=bounds, kdims=img.kdims,
--> 149                          vdims=img.vdims, crs=proj)
    150 
    151 

/usr/local/lib/python3.6/dist-packages/geoviews/element/geo.py in clone(self, data, shared_data, new_type, *args, **overrides)
     95             overrides['crs'] = self.crs
     96         return super(_Element, self).clone(data, shared_data, new_type,
---> 97                                            *args, **overrides)
     98 
     99 

/usr/local/lib/python3.6/dist-packages/holoviews/core/dimension.py in clone(self, data, shared_data, new_type, *args, **overrides)
    570         # Apply name mangling for __ attribute
    571         pos_args = getattr(self, '_' + type(self).__name__ + '__pos_params', [])
--> 572         return clone_type(data, *args, **{k:v for k,v in settings.items()
    573                                           if k not in pos_args})
    574 

/usr/local/lib/python3.6/dist-packages/geoviews/element/geo.py in __init__(self, data, **kwargs)
     87         elif crs:
     88             kwargs['crs'] = crs
---> 89         super(_Element, self).__init__(data, **kwargs)
     90 
     91 

/usr/local/lib/python3.6/dist-packages/holoviews/element/raster.py in __init__(self, data, kdims, vdims, bounds, extents, xdensity, ydensity, **params)
    235             or (isinstance(data, np.ndarray) and data.size == 0)):
    236             data = np.zeros((2, 2))
--> 237         Dataset.__init__(self, data, kdims=kdims, vdims=vdims, extents=extents, **params)
    238 
    239         dim2, dim1 = self.interface.shape(self, gridded=True)[:2]

/usr/local/lib/python3.6/dist-packages/holoviews/core/data/__init__.py in __init__(self, data, kdims, vdims, **kwargs)
    188 
    189         initialized = Interface.initialize(type(self), data, kdims, vdims,
--> 190                                            datatype=kwargs.get('datatype'))
    191         (data, self.interface, dims, extra_kws) = initialized
    192         validate_vdims = kwargs.pop('_validate_vdims', True)

/usr/local/lib/python3.6/dist-packages/holoviews/core/data/interface.py in initialize(cls, eltype, data, kdims, vdims, datatype)
    204                                   % (intfc.__name__, e))
    205                 error = ' '.join([error, priority_error])
--> 206             raise DataError(error)
    207 
    208         return data, interface, dims, extra_kws

DataError: None of the available storage backends were able to support the supplied data format.

philippjfr commented 6 years ago

This has now been implemented, datashader operations will now respect the coordinate reference system of the inputs. Should make things a lot more intuitive, we should however add a guide on working with Datashader and GeoViews.

holoviz / geoviews

Add support for crs on holoviews datashader operations #45