sktime / skpro

A unified framework for tabular probabilistic regression, time-to-event prediction, and probability distributions in python
https://skpro.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
250 stars 46 forks source link

[ENH] Creating a new Bayesian Regressor with PyMC as a backend #358

Closed meraldoantonio closed 2 weeks ago

meraldoantonio commented 6 months ago

Reference Issues/PRs

https://github.com/sktime/skpro/issues/7

What does this implement/fix? Explain your changes.

This WIP PR implements a Bayesian Linear Regressor with PyMC as a backend

Does your contribution introduce a new dependency? If yes, which one?

Yes - it depends on PyMC family: PyMC itself, XArray and ArviZ

What should a reviewer concentrate their feedback on?

The design of the BayesianLinearRegressor. Especially:

  1. The introduction of the priors. For now, the class hardcodes the priors. We need to think about the way in which the users should inject their own priors.

Did you add any tests for the change?

Not yet

Any other comments?

N/A

PR checklist

For all contributions
For new estimators

(This is not yet done)

review-notebook-app[bot] commented 6 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

fkiraly commented 5 months ago

@meraldoantonio, code formatting failures still, see above. You should be able to automatically format on save, or be able to type things like black . in your console to format.

fkiraly commented 2 months ago

Strange import error - is this related to an upper bound of any of the imports implied by 3.9, e.g., scipy?

meraldoantonio commented 1 month ago

Strange import error - is this related to an upper bound of any of the imports implied by 3.9, e.g., scipy?

Apparently there is a bug with Arviz 0.17 and scipy>=1.13 (source 1) (source 2).

The bug is no longer present in Arviz 0.18 but this requires Python 3.10 and above.

As a temporary solution, I've locked the scipy version in all-extras in project.toml

fkiraly commented 1 month ago

Makes sense.

From a maintenance perspective, applying the version bound in the pyproject.toml is not a good solution, since the lock is implied only by a single estimator, and not by scipy itself.

Could you add the lock instead in the python_dependencies tag of the estimator, and revert the changes to pyproject?

meraldoantonio commented 1 month ago

Makes sense.

From a maintenance perspective, applying the version bound in the pyproject.toml is not a good solution, since the lock is implied only by a single estimator, and not by scipy itself.

Could you add the lock instead in the python_dependencies tag of the estimator, and revert the changes to pyproject?

Makes sense! But I've tried this a couple of times and for some reason, without the pyproject.toml lock, the test framework keeps installing the "wrong" version of scipy (version 1.13.1), even after specifying "scipy<=1.12.0" in the python_dependencies tag...

It might be that other libraries are pulling in a conflicting version, but I haven't managed to find the exact cause..

Any ideas?

fkiraly commented 1 month ago

Any ideas?

Why are you trying to bound scipy instead of arviz? I would simply bound arviz>=0.18, based on your statements, as well as python_version >= 3.10

fkiraly commented 1 month ago

PS: why did you close the notebook PR? That was a nice notebook, and indeed it would be nice as separate PR.

meraldoantonio commented 1 month ago

Any ideas?

Why are you trying to bound scipy instead of arviz? I would simply bound arviz>=0.18, based on your statements, as well as python_version >= 3.10

Ah ok! I was trying to make it work for python 3.9. I've now locked the python version and arviz version in the latest commit

PS: why did you close the notebook PR? That was a nice notebook, and indeed it would be nice as separate PR.

Just closed it temporarily as I'm still working on adding the update method. I’ll re-open it once that's done!

fkiraly commented 1 month ago

is this PR ready for merge?

I would really suggest to chunk PRs in smaller, self-contained additions, so we can merge your contributions more quickly and PR do not get too large. For instance, update we can add later and it should not "hurt" (or would it?)

meraldoantonio commented 1 month ago

is this PR ready for merge?

Yes it is!

I would really suggest to chunk PRs in smaller, self-contained additions, so we can merge your contributions more quickly and PR do not get too large. For instance, update we can add later and it should not "hurt" (or would it?)

Sure, noted! Yes, you're right, we can add the update method in later PRs

Also, I've documented the logic behind this class in this PR, could you please take a look at this when you have time? Thank you!

fkiraly commented 1 month ago

is this PR ready for merge?

Yes it is!

Then you should adjust the tagging/signposting:

fkiraly commented 1 month ago

PS: there is a conflict in the pyproject.toml, as your PR does not seem up to date with the recent release. Fixing is as usual: sync your fork, and merge main into your branch.

meraldoantonio commented 1 month ago

PS on the update method: Based on my research, there isn't an ideal method for performing update using PyMC. There are 2 possible routes, however, both have significant limitations:

  1. Rebuilding the prior from the posterior using marginal 1D empirical approximation: This approach, described in the PyMC documentation, works fine when dealing with a single variable. However, if we have multiple variables (such as in our case, where we have an intercept and a slope per feature), this method is problematic because it loses the correlation structure between variables.

  2. Rebuilding the prior using a multivariate normal approximation: Another approach is to approximate the posterior as a multivariate normal distribution, as discussed in the PyMC Experimental library. While this method retains the correlation between variables, it assumes normality, which may not hold in cases where the posterior is multimodal or skewed.

Instead of introducing a potentially misleading update method, I think it would be better to omit it at this stage

meraldoantonio commented 3 weeks ago

The code looks great!

There are some testing issues:

  • the failure on 3.9 still seems there, even though we clearly set the tag. Could you check if you can understand what is going on there?
  • get_test_params should be populated with at least two examples.

I'm not sure what's going on with the failing tests with Python 3.9, I've tried playing around with the versions but somehow the test suite ignores this tag :(

I've added examples to get_test_param

fkiraly commented 3 weeks ago

Summarizing our collaborative coding session on November 8, we have found out what the problem is in the failures on python 3.9, namely a known issue with the dependency checking framework that is already fixed in sktime and skbase.

The todo is summarized in https://github.com/sktime/skpro/issues/489, and should happen in a separate pull request.

fkiraly commented 2 weeks ago

oops, looks like this got accidentally closed, I reopened.

The reason for this is, you accidentally used a reserved "closing keyword" in #490, that is, you wrote "fixes #358" in the preamble. If you do this, then the mentioned issue or PR gets closed once the PR is merged! I did not know btw that this also applied to PR, not just to issues.