tpvasconcelos / ridgeplot

Beautiful ridgeline plots in python
https://ridgeplot.readthedocs.io/
MIT License
62 stars 3 forks source link

[DISCUSSION] Using `ridgeplot` for `sktime` and `skpro` distributional predictions? #171

Closed fkiraly closed 7 months ago

fkiraly commented 7 months ago

For a while I have now been thinking about what a good plotting modality would be for fully distributional predictions, i.e., the output of predict_proba in sktime or skpro.

The challnge is that you have a (marginal) distribution for each entry in a pandas-like table, which seems hard to visualize. I've experimented with panels (matplotlib.subplots) but I wasn't quit happy with the result.

Now, by accident (just curious clicking), I've discovered ridgeplot.

What would you think of using the look & feel of ridgeplot as a plotting function in BaseDistribution? Where rows are the rows of the data-frame like stucture, and mayb there are also columns (but I am happy with the single-variable case too)

The main difference is that the distribution does not need to be estimated via KDE, you already have it in a form where you can access pdf, cdf, etc, completely, and you have the quantile function too which helps with selecting x-axis range.

Plotting cdf and other distribution defining functions would also be neat, of course pdf (if exists), or cdf (for survival) are already great.

Imagined usage, sth like

fcst = BuildSth(Complex(params), more_params)
fcst.fit(y, fh=range(1, 10)
y_dist = fcst.predict_proba()

y_dist.plot() # default is pdf for continuous distirbutions
y_dist.plot("cdf")

Dependencies-wise, one could imagine ridgeplot as a plotting softdep like matplotlib or seaborn, of skpro and therefore indirectly of sktime.

What do you think?

tpvasconcelos commented 7 months ago

Hey @fkiraly happy to see you here! Give me some time to take a look at this one as I'm not super familiar with that class of forecasters from sktime.

The main difference is that the distribution does not need to be estimated via KDE, you already have it in a form where you can access pdf, cdf, etc, completely, and you have the quantile function too which helps with selecting x-axis range.

If I understood correctly, this is completely fine when it comes to interacting with ridgeplot, since we also accept x-y traces as input -- bypassing the KDE step altogether

fkiraly commented 7 months ago

as I'm not super familiar with that class of forecasters from sktime.

Most popular forecasters have a probabilistic prediction mode, and they can be filtered by the tag capability:pred_int.

Check it out in the forecasting tutorial, the main tutorial of skpro also explains the tabular (non-time-series) interfaces arorund it.

If I understood correctly, this is completely fine when it comes to interacting with ridgeplot, since we also accept x-y traces as input -- bypassing the KDE step altogether

Yes - in the conceptual space of sktime / skpro, the KDE step is an estimator, of type "distribution from sample". In the desired plot, we would go directly from distribution to plot, without th first step of going to distribution from sample.