openml / openml-python

Python module to interface with OpenML
https://openml.github.io/openml-python/main/
Other
276 stars 142 forks source link

Fix/sklearn test compatibility #1340

Closed PGijsbers closed 3 weeks ago

PGijsbers commented 3 weeks ago

This PR updates tests to be compatible with newer versions of scikit-learn. In the process, fixes some bugs and adds numpy 2.0 support. I updated the CI matrix to include newer versions of scikit-learn. For the most part, openml-python works fine on newer versions, it was just that the unit tests themselves had hard dependencies on specifics. I extended/updated in the most cases, but in a few it was a little unreasonable (needs to be refactored to make it less sensitive to small scikit-learn api changes). I already spent quite a bit of time on this, so I hope that the proposed changes are good enough for now, so we can work on a 0.15.0 release.

The docs fail in the check-links step. I propose to fix this in a separate PR.

PGijsbers commented 3 weeks ago

edit: no longer relevant

Test failures seem to be caused by exceeding the 10 minute time limit we have set. I am not sure why these tests need as long, but it looks like they fail and are repeated. It's likely their failure is deterministic however. So I will go ahead and disable retries so the test does a hard fail instead with a more meaningful error message (hopefully).

codecov-commenter commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 83.95%. Comparing base (923b49d) to head (2c161e4).

Files Patch % Lines
openml/extensions/sklearn/extension.py 60.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #1340 +/- ## ============================================ + Coverage 32.94% 83.95% +51.00% ============================================ Files 38 38 Lines 5254 5259 +5 ============================================ + Hits 1731 4415 +2684 + Misses 3523 844 -2679 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

PGijsbers commented 3 weeks ago

Tests are incredibly slow, I am not sure why. This also happens locally, so it's not just the runners. I suspect it's the server.

edit: it looks like the culprit is mostly the production server :( (most (all?) of these long tests use prod)

PGijsbers commented 3 weeks ago

edit: message outdated, pr now fully supports numpy 2.0.


old message:

On Numpy 2.0 support/scikit-learn support (from #1341):

Numpy2.0 cleaned up their namespace, so we have to alias for the new version. The problem is that this is not covered in CI (tested versions do not support numpy 2.0), and in general tests need to be updated to support more recent scikit-learn versions that support 2.0 (1.5, maybe 1.4?). I don't know if I really have the time for that right now. Options:

  • Put a pin in the dependencies
  • Accept this as a best-effort, and hope that the tests that don't fail due to scikit-learn incompatibility sufficiently cover the code to raise any numpy1/2 issues
  • Wait until I find time to make tests support scikit-learn 1.5

But I am currently actually leaning towards putting in a bit of extra time to make sure we can test on later scikit-learn versions.

PGijsbers commented 3 weeks ago

@LennartPurucker I pinged you for a review since at this point I am unsure what to do. The work is done, kind of. Sometimes the tests still fail sporadically because the pytest workers get killed on GitHub actions. I have not had this happen locally. It's also not consistent - they can also pass. At this point, there are already large improvements to the tests and CI. It's probably already worth merging. From there, I can work on both new features/updates as well as debug, but then at least that can be in parallel and be discussed in separate smaller PRs. What do you think?

edit: I suspect this may also be due to the timeout, though previously I have had clearer messages when the timeout functionality was triggered. I'll remove the timeouts so we can compare CI results.

edit2: The one failing test (sporadically) is due to an internal timeout now. I think it's fairly safe to say that this has to be a server side issue, with the same test passing on other instances.