Document regrid operation

jbednar commented 7 years ago

I'm not sure if I'm going about this the right way, but I don't seem to get anything reasonable out of datashading an Image:

import holoviews as hv
import numpy as np
from holoviews.operation.datashader import datashade, shade, aggregate, regrid
hv.extension("bokeh")

x,y = np.mgrid[-50:51, -50:51] * 0.1
r = 0.5*np.sin(np.pi+3*x**2+y**2)+0.5
g = 0.5*np.sin(x**2+2*y**2)+0.5
b = 0.5*np.sin(np.pi/2+x**2+y**2)+0.5

hv.Image(r) + datashade(hv.Image(r))

I had expected this to use datashader's raster support, but when first loading there is no image at all in the second subfigure, and then on zooming out and back in I get various crazy patterns like this.

Eventually I don't want an Image, but an RGB where each of 3 or 4 underlying images is datashaded, but I can't see how I would even express that (datashade(hv.RGB(...))?). I'm trying to implement https://anaconda.org/jbednar/landsat using holoviews, but am not having much luck.

philippjfr commented 7 years ago

Yes, you're going about this wrong, the default aggregator is count which does nothing sensible for an Image, you can use aggregate/datashade if you set the appropriate x_sampling and y_sampling and an aggregator like ds.mean. What you really want though is the regrid operation.

philippjfr commented 7 years ago

Repurposed the issue to document regrid in HoloViews.

jbednar commented 7 years ago

Hmm. Regrid is better, but doesn't work either:

philippjfr commented 7 years ago

Can't reproduce that, this works fine:

img = hv.Image(np.arange(10)* np.arange(10)[np.newaxis].T)
img + regrid(img)

What version of datashader do you have? I know it's less convenient but I always add the code and plot separately.

jbednar commented 7 years ago

Github master (0.6.1-6-g09973d3).

jbednar commented 7 years ago

Anyway, I'm surprised that a separate operation is needed. If I do provide an aggregator, I get something more recognizable:

But it's clearly converted the image to points, which I would have expected to require datashade(hv.Points(hv.Image(r))); I was not expecting datashade to change the type of what I provided.

jbednar commented 7 years ago

You should try the code in the first cell above, but with datashade replaced by regrid. In your range based example I can't tell if it's working properly or not, but with the above code I can see that it definitely is not.

philippjfr commented 7 years ago

Could you paste the code?

philippjfr commented 7 years ago

Sigh...there appears to be another bug in datashader.

jbednar commented 7 years ago

The copyable code is in the first cell...

jbednar commented 7 years ago

(Only the very last line changes in any of the examples here).

jbednar commented 7 years ago

I know this isn't the way it's currently implemented, but the way I expected this to work is:

aggregate(hv.Points(..)) uses canvas.points(agg=count()).
aggregate(hv.Path(...)) (or Curve) uses canvas.line(agg=any())
aggregate(hv.Image(...)) (or RGB, which would work individually) uses canvas.raster(agg=mean()))

Each of the indicated agg defaults is the default value in datashader (for None).

I know raster() doesn't currently use the same agg options, but I think it should eventually, at least for downsampling, so it would be nice to unify that at the ds level. Otherwise, is this very different from how you think it should be?

philippjfr commented 7 years ago

I'd be happy for aggregate to provide some convenience around regrid but I don't think it replaces it because that is more general than what aggregate can provide. Apart from that the main thing I'm worried about is that it would have to ignore the default count aggregator and that some others like count_cat and any are also pointless for an Image.

I know raster() doesn't currently use the same agg options, but I think it should eventually, at least for downsampling, so it would be nice to unify that at the ds level.

The agg objects are a bit weird in the case of images because there is no column to refer to, so there's no reason to do ds.mean('z') and just using ds.mean would still be inconsistent.

jbednar commented 7 years ago

Well, all of the canvas glyphs actually default to None; can't we use that?

jbednar commented 7 years ago

any isn't pointless for an image with an alpha mask; it should basically extract that mask at a different resolution. count_cat is meant to be generalized to cat(count...), and some of the aggregators provided to it may be more meaningful.

philippjfr commented 7 years ago

Fixed in https://github.com/bokeh/datashader/pull/475, please test when you get a minute.

jbednar commented 6 years ago

The bug is fixed, but I do think we need to address the API issues still.

philippjfr commented 6 years ago

As a first step we'd have to support ds.mean(), ds.min() etc. so the API in datashader is at least consistent. Currently that results in:

TypeError: __init__() missing 1 required positional argument: 'column'

jlstevens commented 6 years ago

It may be mentioned in another issue, but I would like datashade to do regridding when passed a raster type. This doesn't mean there can always be a dedicated regrid operation for more advanced control with its own documentation.

If you agree with that idea, I would document the regridding capability of datashade first (once it is added) as passing rasters to datashade seemed like a pretty intuitive thing and something I expected to 'just work'.

philippjfr commented 6 years ago

I believe that's exactly what we already decided above.

jlstevens commented 6 years ago

I'm talking about documentation: we should document datashade first as being able to do regridding then the regrid operation itself.

jbednar commented 6 years ago

I agree with all that. Yes, sounds like we need to make those changes to ds.mean(), etc.

jbednar commented 6 years ago

Those changes are now done, as of Datashader 0.6.5, so hv can now be updated to match.

philippjfr commented 6 years ago

I'll hand this over to you once I've improved rasterize.

philippjfr commented 6 years ago

While you're working on these docs there are a few performance related things to point out:

Dask dataframes are going to be fastest, pandas dataframes a close second
precompute can be used to cache the last set of data, for dataframes/dask tabular datasets and xarray gridded datasets this makes no difference, for TriMesh/QuadMesh this will make a huge difference
When using datashader+geoviews the data should be projected first using gv.operation.project.

holoviz / holoviews

Document regrid operation #1909