sktime / skpro

A unified framework for tabular probabilistic regression, time-to-event prediction, and probability distributions in python
https://skpro.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
238 stars 45 forks source link

[ENH] non-parametric default for `_predict_proba` if `_predict_quantiles` is available #246

Open fkiraly opened 6 months ago

fkiraly commented 6 months ago

Point raised by @Ram0nB in https://github.com/sktime/skpro/pull/236, about using a more meaningful default for _predict_proba if not available. Relevant for skpro, but also the same logic in sktime:


Original comment


One thing that comes to mind is whether we also want to add the logic of converting a quantile prediction to a distribution estimate to the BaseProbaRegressor. Currently, the implementation of BaseProbaRegressor's _predict_proba uses the var and mean prediction to return a normal distribution.

Maybe we can enhance BaseProbaRegressor's _predict_proba such that it uses the QPD_Empirical if _predict_quantiles/_predict_interval are available, and else the current logic. This way, we don't assume a normal distribution if multiple quantiles are available. What are your thoughts on this @fkiraly ?

fkiraly commented 6 months ago

What are your thoughts on this @fkiraly ?

Excellent suggestion, in my opinion!

Yes, the normal assumption has bothered me for a while, but there haven't been too good alternatives before the various empiricals had been implemented.

One question of course is, one would need to choose some arbitrary quantile points if we would be using empirical QPD. Perhaps, all the percentiles?

Further, a problem could be lack of smoothness, which have the risk of suddenly breaking user workflows that involve losses assuming continuous distributions, this might be a major issue to finish discussion on before doing sth too quickly.

fkiraly commented 6 months ago

Some options I can think of, besides making this the default overall: