holoviz / hvplot

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
https://hvplot.holoviz.org
BSD 3-Clause "New" or "Revised" License
1.02k stars 99 forks source link

Add HoloViews' `subcoordinate_y` #1160

Open droumis opened 9 months ago

droumis commented 9 months ago

subcoordinate_y was recently added to HoloViews to plot on subcordinates. This should now be incorporated into hvPlot when working with xarray.

The API could be something as simple as: da.hvplot(subcoordinate_y=True).

This would produce something like: image

The minimum data are:

There should also be the option for adjusting subcoordinate_scale (as in HoloViews).

Whether or not the minimap should be included with the simple default API call is something that needs to be discussed and considered further.

maximlt commented 6 months ago

@droumis I can see the project board that this is planned for Q4 of this year. Is this a hard deadline?

But my main question would be: is this feature not too niche to be exposed in hvPlot? Increasing hvPlot's API comes with a cost, it makes it more difficult for users to navigate through its many parameters. I'm not -1 on this, but I'd like to make sure this is discussed.


Whether or not the minimap should be included with the simple default API call is something that needs to be discussed and considered further.

Yes I wouldn't tie the "minimap" (or however we call it in the end) with subcoordinate_y. We should open a separate issue to discuss how we'd generally like to expose the minimap in hvPlot.


If it's eventually implemented, I guess we could also support a DataFrame with a datetime index, using the column names as the source labels.

philippjfr commented 6 months ago

But my main question would be: is this feature not too niche to be exposed in hvPlot? Increasing hvPlot's API comes with a cost, it makes it more difficult for users to navigate through its many parameters. I'm not -1 on this, but I'd like to make sure this is discussed.

Personally, I feel like this is more of an issue on how our doc(strings) are organized than a general issue. subplot_y can be quite a nice and general feature and if it isn't general enough to generate e.g. a joy plot then that should be improved at the HoloViews level.

droumis commented 6 months ago

@maximlt , I'm in the process of updating the project board (this week); Q4 is not a hard deadline for this task, I'm going to push it to later since we are also still working on tasks that would directly impact this implementation.

I agree with @philippjfr that a ridgeplot (aka joyplot) is a common enough type of plot to justify the inclusion burden for hvPlot. With subcoordinate_y this is possible using Curve, but Area is more commonly used for a ridgeplot and it looks like it needs the element order to be reversed by default. I'll file an issue with HoloViews.

Code ```python import numpy as np import holoviews as hv; hv.extension('bokeh') from scipy.stats import gaussian_kde categories = ['A', 'B', 'C', 'D', 'E'] data = {cat: np.random.normal(loc=i-2, scale=1.0, size=100) for i, cat in enumerate(categories)} x = np.linspace(-5, 5, 100) curves = [] areas = [] for i, (cat, values) in enumerate(data.items()): pdf = gaussian_kde(values)(x) curve = hv.Curve((x, pdf), label=cat).opts( subcoordinate_y=True, subcoordinate_scale=1.5, ) curves.append(curve) area = hv.Area((x, pdf), label=cat).opts( subcoordinate_y=True, subcoordinate_scale=1.5, ) areas.append(area) ridge_plot_curves = hv.Overlay(curves).opts( width=900, height=400, ) ridge_plot_areas = hv.Overlay(areas).opts( width=900, height=400, ) # ridge_plot_areas.opts(show_legend=False) ridge_plot_curves.opts(show_legend=False) ```
image image
philippjfr commented 6 months ago

While we're at it, HoloViews should also support grabbing the label for each subcoordinate from the NdOverlay key. So it becomes as simple as this:

import numpy as np
import hvplot.pandas
from scipy.stats import gaussian_kde

categories = ['A', 'B', 'C', 'D', 'E']
data = {cat: np.random.normal(loc=i-2, scale=1.0, size=100) for i, cat in enumerate(categories)}
pd.DataFrame(data).hvplot.kde(y=categories, subcoordinate_y=True, subcoordinate_scale=1.5)

Right now you have to manually insert the labels explicitly:

import numpy as np
import hvplot.pandas
from scipy.stats import gaussian_kde

categories = ['A', 'B', 'C', 'D', 'E']
data = {cat: np.random.normal(loc=i-2, scale=1.0, size=100) for i, cat in enumerate(categories)}

labels = iter(categories)
overlay = pd.DataFrame(data).hvplot.kde(y=categories).map(lambda el: el.relabel(next(labels)), specs='Distribution')
overlay.opts('Distribution', subcoordinate_y=True, subcoordinate_scale=1.5)

bokeh_plot - 2023-12-11T180002 399