holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.32k stars 365 forks source link

Canvas.points does not support xarray.DataArray #1029

Closed Kusefiru closed 2 years ago

Kusefiru commented 2 years ago

Hi,

I have some data organized in a xarray.DataArray that I want to process through Canvas.points(). According to the documentation, this should be possible: source : pandas.DataFrame, dask.DataFrame, or xarray.DataArray/Dataset The input datasource.

However, this code results in the following error:

data = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], dims=('x', 'y'), coords={'x': [0,1,2], 'y': [0,1,2]}) 
ds.Canvas().points(data, x='x', y='y')

ValueError: source must be a pandas or dask DataFrame

It seems the issue comes from the bypixel method. It should convert Xarray DataArrays to Dask DataFrames as commented, but it doesn't appear to do so : https://github.com/holoviz/datashader/blob/6a5e63bd4bcda3097cc586edca274fce30b3f5a6/datashader/core.py#L1177-L1181

Because of that, the issue also occurs for other methods such as Canvas.area and Canvas.line.

philippjfr commented 2 years ago

At their core Canvas.points/line/area rasterize geometries into a 2D array, and since xarray is primarily used for representing dense n-dimensional arrays it usually does not make sense to feed it into the points/line/area methods. The only case that is supported therefore is the 1D case, which is effectively the same as a DataFrame, hence the source.ndim == 1 in the code snippet you are pointing to.

Your example feeds a two-dimensional grid into the Canvas.points function so what exactly are you hoping to get as output here? If .points were to support N-dimensional arrays as input it would simply resize the array, and very poorly and inefficiently at that. If that's the operation you're trying to achieve use Canvas.raster or Canvas.quadmesh which are designed for that operation and implement proper up- and down-sampling.

Kusefiru commented 2 years ago

I was mostly trying out each options to see the result, but if it works as intended, then it's my bad.

jbednar commented 2 years ago

@philippjfr is correct that plotting a multidimensional DataArray as points isn't meaningful and should return an error, because Canvas.points fundamentally expects and accepts a list of 2D points, and an nD array is a gridded sampling of a space rather than a list of points. You can explicitly iterate over your dense xarray data and create such a list of 2D points if that's what you want, and Datashader will be happy to plot that.

BTW, https://datashader.org/user_guide/Performance.html has a chart listing which data object is supported for which call, but from this code snipped the chart appears to be incorrect, in that it says Xarray isn't supported for points/line/area yet it does seem to handle the meaningful 1D case by promoting. It would be nice to have an example of the 1D case working, in which case I'd update the table.