pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

HoloViews based plotting API #2199

Closed philippjfr closed 6 years ago

philippjfr commented 6 years ago

As part of a recent project we have been working on a plotting API for a number of projects including pandas and xarray called HoloPlot. You can see some examples using the API with xarray here. As the name suggests it is built on HoloViews and is meant as an alternative for the native plotting API that closely mirrors but does not necessarily match those APIs exactly. The main differences are:

The main question I'd like to put to the xarray community is how we should best expose this API. In pandas there has been some discussion to add a configurable engine for the plotting API letting you switch between different plotting implementations (see https://github.com/pandas-dev/pandas/issues/14130). The approach we started with was to clobber the DataArray.plot API entirely, which I now consider to obtrusive and likely to interfere with existing workflows. The alternative approaches we considered:

I'd love to hear what xarray maintainers and users think would be the best approach here.

fmaussion commented 6 years ago

It looks like a good use case for accessors. The syntax could then be: DataArray.hv.plot() and would give you full flexibility.

shoyer commented 6 years ago

Very cool! I also think this would be a good use case for a new accessor, perhaps DataArray.holoplot() mirroring our preference for accessor names to match projects.

An engine keyword/option could also be viable, but would require more coordination (e.g., figuring out the plotting interface, which seems to have stalled that plotting issue). That said, if pandas figured out a way to do this I'm sure we would be happy to copy it.

philippjfr commented 6 years ago

Thanks for the feedback! I'll try to drive the pandas conversation along, but since I doubt that will be resolved in the near term so I think until then we should definitely pursue the accessor approach (which is much better than the property monkey patching we're doing now).

Personally I'd prefer DataArray.hvplot() since I think even the two extra characters make a difference and something like DataArray.hv.plot.contourf() seems too deeply nested. That said if "our preference for accessor names to match projects" is a solidly established convention I'll defer to that and go with DataArray.holoplot().

@rabernat Since you have used HoloViews with xarray in the past I'd very appreciate your input as well.

rabernat commented 6 years ago

I am a big fan of holoviews and have been using it extensively for my own work in recent months. So obviously I am a big 👍 on this integration.

I agree the accessor is the best option for now, but I have no strong opinions about the name of the accessor.

Some features I would like to see are things that go beyond the plotting capabilities associated with the matplotlib engine. For example:

rabernat commented 6 years ago

Oh and another big 👍 to the datashader integration. This is crucial for my datasets.

philippjfr commented 6 years ago

I agree the accessor is the best option for now, but I have no strong opinions about the name of the accessor.

Okay thanks, given xarray's preference for accessor names to match projects I'm now leaning toward da.holoplot().

Automatic generation of DynamicMaps. Say I have a DataArray with dimensions ('time', 'lat', 'lon'); I should be able to say da.hv.plot(kdims=['lat', 'lon'] and have time become a dynamic selector.

HoloPlot explicitly does not deal with kdims and vdims instead more closely following the API of pd.DataFrame.plot and xr.DataArray. That said coordinates that are not assigned to the x/y axes will automatically result in a DynamicMap, so this will give you an image plot + a widget to select the time:

da.holoplot(x='lon', y='lat', kind='image')

To go along with the above, lazy loading of dask-backed arrays

That should happen automatically.

Intelligent faceting which automatically links the facet kdims

You can facet in a number of ways:

da.isel(time=slice(0, 3)).holoplot(x='lon', y='lat', kind='image', by='time')

will produce three subplots which are linked on the x- and y-axis, i.e. zooming on one will zoom on all unless you set shared_axes=False. You can also generate a grid with:

da.isel(time=slice(0, 3)).holoplot(x='lon', y='lat', kind='image', row='time', col='some_other_coord')

Plotting not just of DataArrays but Datasets.

This is also already supported, the API here is:

ds.holoplot(x='lon', y='lat', z=['air', 'surface'])

Will provide a widget to select between the 'air' and 'surface' data variable.

Options for projections, coastlines, etc. associated with geoviews

Currently working on that, it's basically just waiting on new HoloViews/GeoViews releases. The API here is as follows:

air_ds.air.holoplot.quadmesh(
    'lon', 'lat', ['air', 'some_other_variable'], crs=ccrs.PlateCarree(), projection=ccrs.Orthographic(-80, 30),
    global_extent=True, width=600, height=500, cmap='viridis'
) * gv.feature.coastline
screen shot 2018-05-30 at 9 03 53 pm
philippjfr commented 6 years ago

something like DataArray.hv.plot.contourf() seems too deeply nested.

Actually I suppose that's not what it would be, it could be da.hv.plot and da.hv.contourf with .plot figuring out the kind for you. I quite like that too.

shoyer commented 6 years ago

I'm not strongly opposed to something like DataArray.hvplot for the accessor, it's just slightly less obvious than DataArray.holoplot.

hv would probably be too short for a good name (but of course this is totally up to you), especially because I can imagine people using hv for a variables name, which can also be accessed via attributes.

philippjfr commented 6 years ago

Thanks again for the feedback, I've decided to go with .holoplot in the end. I'll work on finishing some of geo related features today and get a 0.1 release and announcement out this week.

philippjfr commented 6 years ago

Thanks for everyone's feedback, due to trademark concerns we decided to rename both the library and the API to .hvplot. There should be a release and an announcement in the coming week.