sktime / skpro

A unified framework for tabular probabilistic regression and probability distributions in python
https://skpro.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
232 stars 45 forks source link

[ENH] roadmap of probabilistic regressors to implement or to interface #7

Open fkiraly opened 4 years ago

fkiraly commented 4 years ago

A wishlist for probabilistic regression methods to implement or interface. This is partly copied from the list I made when designing the R counterpart https://github.com/mlr-org/mlr3proba/issues/32 . Number of stars at the end is estimated difficulty or time investment.

GLM

KRR aka Gaussian process regression

CDE

Gradient boosting and tree-based

Neural networks

Bayesian toolboxes

Pipeline elements for target transformation

Composite techniques, reduction to deterministic regression

Ensembling type pipeline elements and compositors

baselines

Other reduction from/to probabilistic regression

nilesh05apr commented 6 months ago

@fkiraly I wish to take up this as my project. What would be a good headstart?

fkiraly commented 6 months ago

pick something that you find interesting, with a single star * ?

I've updated the list with checkmarks for implemented estimators.

ShreeshaM07 commented 6 months ago

@fkiraly , I am interested in this project idea and would like to start off by adding an interface to ngboost to skpro. Can I go ahead? Also a small doubt since I haven't contributed to skpro earlier, is using the same versions as sktime sufficient for skpro or should I create another virtual environment for it?

fkiraly commented 6 months ago

@ShreeshaM07, nice! Can you then quickly post in https://github.com/sktime/skpro/issues/135 that you will be working on this?

Also a small doubt since I haven't contributed to skpro earlier, is using the same versions as sktime sufficient for skpro or should I create another virtual environment for it?

I would advise to have a virtual environment ready for testing, with an editable install of skpro.

Like with sktime, you can do an editable install with a pip install -e . in a clone of the skpro repo.

If you have an sktime environment, you might have skpro already installed, but not as editable, in that case your changes to the code will not be reflected in the environment.

Personally, I have an environment where both sktime and skpro are installed as editable versions, to allow debugging and testing across different packages.

Happy to connect quickly on the discord dev-chat if you have further questions about this.

julian-fong commented 6 months ago

@fkiraly Hey Franz, I would like to contribute towards some of the GLMs with regression links, is there anything i need to do setup wise with skpro that is different than sktime?

fkiraly commented 6 months ago

@fkiraly Hey Franz, I would like to contribute towards some of the GLMs with regression links

Excellent! I'd recommend to start with the statsmodels ones: https://www.statsmodels.org/stable/glm.html#module-statsmodels.genmod.generalized_linear_model, and with Gaussian link.

is there anything i need to do setup wise with skpro that is different than sktime?

It is the same, except of course you do pip install -e .[dev] in a clone of skpro, not sktime.

I'm typically developing in an environment that has editable versions of both, plus scikit-base, that allows me to make changes in all three packages. The "catch" if you do this is that you have to install editable versions in sequence of dependence, i.e., first skbase, then skpro, then sktime, otherwise pip will get the non-editable pypi versions.

ShreeshaM07 commented 6 months ago

@fkiraly , Just wanted to know where reducing deterministic (quantile) regression to probabilistic regression - take quantile(s) has been implemented to get an idea on what needs to be done in these types of issues. Could you please help me out.

fkiraly commented 6 months ago

yes, that has been implemented already, by @Ram0nB in MultipleQuantileRegressor, see #108. You can figure out which algorithms have been contributed already by the checkmark next to them (I hope that's all correct, but feel free to ask).