tpvasconcelos / ridgeplot

Beautiful ridgeline plots in Python
https://ridgeplot.readthedocs.io/
MIT License
121 stars 6 forks source link

Different KDE implementations #116

Closed tpvasconcelos closed 1 month ago

tpvasconcelos commented 1 year ago

We have experienced some issues in the past with statsmodels' KDE implementation (see in-line comments in ridgeplot._kde.estimate_density_trace().

Things to keep in mind:

fkiraly commented 9 months ago

Design-wise, would it not be cleanest to:

I wanted to write some abstract density estimation intefaces anyway, for skpro (though ofc no need to use them - just saying that I have done some thinking around that topic).

tpvasconcelos commented 1 month ago

The main contender here would be scipy's scipy.stats.gaussian_kde which is used by pandas in pandas.Series.plot.kde. However, this version lacks some features provided by statsmodels which is an issue for backwards compatibility.

In principle I agree with @fkiraly that it would be better to separate density estimation (and in the future: histogram binning) from the plot factory. However, this project tried to follow the design patterns in the Plotly Express API from the beginning which resulted in exposing the KDE logic to the figure factory function.

I'm closing this issue for now as I don't see a strong need to make large changes to this logic atm...