stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.62k stars 214 forks source link

Add py311, remove py37 #320

Closed jack-mcivor closed 5 months ago

jack-mcivor commented 1 year ago

Fixes #319

It is easiest to implement NEP-29. All current versions of numpy, scipy, scikit-learn are built for 3.8-3.11 (and do not support Python 3.7)

Pandas can be removed as it is a transitive dependency only.

np.bool was deprecated in NumPy 1.20.

jack-mcivor commented 1 year ago

I could do with a bit of help on this one. Two tests failing:

============================================================================================== short test summary info ==============================================================================================
FAILED tests/test_distns.py::test_dists_runs_on_examples_logscore[learner0-T] - ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
FAILED tests/test_distns.py::test_dists_runs_on_examples_logscore[learner1-T] - ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

It looks like this is caused by an upgrade of scipy, although I'm not clear why.

It doesn't seem to matter what versions of numpy, scikit-learn are used. It seems like the failure mode is variable based on the random-seed, but it looks like it's down to failures in calculating the gradient (either due to a Singular matrix error or nans in the result).

There are in general quite a few errors when running the tests, not sure what to make of them.

tim-habitat commented 1 year ago

It doesn't seem to matter what versions of numpy, scikit-learn are used. It seems like the failure mode is variable based on the random-seed, but it looks like it's down to failures in calculating the gradient (either due to a Singular matrix error or nans in the result).

I've had that issue for a while with multi-variate normal distribution - in my case i turned the natural_gradient off and this issue seemed to be going out. Not sure if relevant at all but thought i'd share.

I can try to have a deeper look later today !

ryan-wolbeck commented 11 months ago

I agree with removing support for 3.7 as it's end of like for Python as well. I'm working to get through some other PRs and I'll work to help figure out the failures here. Thanks for the PR

ryan-wolbeck commented 10 months ago

@jack-mcivor can you re-run checks here? Logs have expired, would like to see the errors. Thanks

jack-mcivor commented 10 months ago

@ryan-wolbeck those logs should be available now

ryan-wolbeck commented 8 months ago

Here's what I'm working on testing here for an update:

# Check for NaN values in y_train and handle them
nan_mask = np.isnan(y_train)
if np.any(nan_mask):
    X_train = X_train[~nan_mask]
    y_train = y_train[~nan_mask]

ngb.fit(X_train, y_train)

I'm planning to work through these updates this week and will hopefully push some code here before the end of the week.

ryan-wolbeck commented 8 months ago

@jack-mcivor I've been working through tests on the versions of SciPy and supported python versions here https://github.com/stanfordmlgroup/ngboost/tree/pr-320 though I've not been successful even when pushing SciPy beyond 1.8.1 which requires > python 3.8 and still run into the the same issues and seem to be stuck.

In order to keep moving forward. I think we should create a new PR to just fix the issue with np.bool so that we can push an update to fix those issues for usability, once that is merged into main I then think we should work towards this removal of 3.7 (and maybe 3.8 but about 15% of downloads come from 3.8 still). I'm worried with this PR that we are going to struggle to find the root cause though I think you are likely right that it's a SciPy issue.

Would you mind creating another PR just to address the np.bool issue since it should already be within the code you added here? If you don't have time I can also work towards splitting it out myself as well.

DumasCharles commented 8 months ago

Hello,

Thank you for your work on NGBoost, and answering my comment on #336 😊

As mentioned in this issue, the bug with np.bool prevents from fitting a model with an Exponential distribution, which can be a major blocker... As you suggest, would it be possible to open a PR specific to this bug? I'm happy to create the PR if that helps.

jack-mcivor commented 8 months ago

https://github.com/stanfordmlgroup/ngboost/pull/337 is to fix the bool deprecation

jack-mcivor commented 7 months ago

@ryan-wolbeck it's been a while since I looked at this, but is there anything else we can do?

I think the failing test is not specific to Python versions at all, but scipy? According to the current constraints, a user could install scipy>=1.9 and get failing tests. So, I think we should xfail this test, or if it is a real problem, then constrain scipy<1.9.

What do you think?

ryan-wolbeck commented 6 months ago

@jack-mcivor lets merge the latest changes from master so we can build the tests faster. From there I'll pull this again and test out the versions to dig in further

jack-mcivor commented 6 months ago

@ryan-wolbeck done, still seems to exhibit the same behaviour. A minimal & fast reproducer is

pytest 'tests/test_distns.py::test_dists_runs_on_examples_logscore[learner0-T]'
ryan-wolbeck commented 6 months ago

@jack-mcivor

The error message indicates that a numpy.linalg.LinAlgError: Singular matrix error occurred. This error is raised when a function in the numpy.linalg module, such as numpy.linalg.solve, is applied to a singular (or non-invertible) matrix.

In this case, the error occurred in the grad method of the Score class in ngboost/scores.py. This method tries to solve a linear system using the numpy.linalg.solve function, which fails if the input matrix is singular.

The test_dists_runs_on_examples_logscore test is failing because it's trying to fit an NGBRegressor with a T distribution, which seems to be causing the singular matrix error.

I think you are right that we should just allow this T in this case to fail (skip the test). I tested an implementation here https://github.com/stanfordmlgroup/ngboost/tree/pr-320-2 and got things to build. I did play around with the versions a bit as well but I don't think that's actually necessary to get the xfail to allow the failure to occur due to the singular matrix error.

Can you push a similar implementation here?

@alejandroschuler do you have any concerns with us allowing this case to fail in order to support 3.11?

alejandroschuler commented 6 months ago

@ryan-wolbeck sure let's let that test fail, nbd

jack-mcivor commented 5 months ago

@ryan-wolbeck thanks, that should be ready now