sktime / skpro

A unified framework for tabular probabilistic regression and probability distributions in python
https://skpro.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
219 stars 42 forks source link

[ENH] interface `XGBoostLSS` et al by `StatMixedML` #184

Open fkiraly opened 5 months ago

fkiraly commented 5 months ago

It would be great to interface the various probabilistic supervised regressors of StatMixedML, so they can then immediately used for forecasting in sktime via skpro!

FYI @statmixedML, @joshdunnlime

Many thanks to @KiwiAthlete for the suggestion!

fkiraly commented 5 months ago

PS @statmixedML, I notice that you are interested in probabilistic forecasting, yet the estimators provided are, strictly speaking, probabilistic tabular regressors. That's not a big problem, as skpro is integrated with the most common reduction compositors in sktime, so any skpro regressor can be directly used to create probabilistic forecasters via make_reduction etc.

What would be nice, with your expertise, is some thinking around uncertainty estimates in recursive regression, that seems non-obvious. Let us know if you are interested in some methology research around that - or if you already have some solutions 😄

StatMixedML commented 5 months ago

@fkiraly Thanks for suggesting the skpro integration. Integrating the LSS-models into both skpro and sktime would be a fantastic extension! For now I suggest we focus on the XGBoostLSS/LightGBMLSS integrations, since the other two LSS frameworks are currently not maintained.

PS @StatMixedML, I notice that you are interested in probabilistic forecasting, yet the estimators provided are, strictly speaking, probabilistic tabular regressors.

That is correct. General purpose tree models are, without using parametric models in the leaf-nodes, not designed for forecasting, since they lack the ability to extrapolate beyond the training data. However, using the linear_tree option in LightGBMLSS, gets us around this problem.

What would be nice, with your expertise, is some thinking around uncertainty estimates in recursive regression, that seems non-obvious. Let us know if you are interested in some methology research around that - or if you already have some solutions.

That sounds like an interesting problem. Can you maybe sketch the problem in more detail. We can also have the discussion via email if you want.

fkiraly commented 5 months ago

That sounds like an interesting problem. Can you maybe sketch the problem in more detail.

Sure! Done in this discussion thread: https://github.com/sktime/skpro/discussions/185 Let me know if it makes sense, or if not, or if you would simply like more explanation.

We can also have the discussion via email if you want.

I know how academics are, so thanks for being considerate in this respect. I think though, it's hard to argue precedence with public GitHub history. Of course, this could be disregarded or not mentioned (I have seen that a couple times), but the same can happen to any paper.

Hence I do not mind the discussion in public, even if novel methodological content comes out of it.

fkiraly commented 5 months ago

Integrating the LSS-models into both skpro and sktime would be a fantastic extension! For now I suggest we focus on the XGBoostLSS/LightGBMLSS integrations, since the other two LSS frameworks are currently not maintained.

Thanks for your support! Let's get to it then 😃, contributions appreciated.

StatMixedML commented 5 months ago

I have created respective branches in the repos

Please work towards them before we actually merge it to master.

fkiraly commented 5 months ago

hm, @StatMixedML, are you planning to write the estimator directly in the respective package? Sure, that works - if you want to test it there, you can use check_estimator from skpro.utils.

Though for that set-up, you may like to consider relaxing your depedency bounds? See discussion in https://github.com/StatMixedML/XGBoostLSS/issues/56