jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Bump max supported scikit-learn version to `1.4.1.post1` #196

Closed rahul-theorem closed 6 months ago

rahul-theorem commented 6 months ago

Closes https://github.com/jpmml/jpmml-sklearn/issues/195

Bumps the max supported scikit-learn range to 1.4.1.post1

rahul-theorem commented 6 months ago

@vruusmann I ran the tests locally, and they all pass except for org.jpmml.sklearn.testing.OutlierDetectorTest.evaluateOneClassSVMHousing (this also fails on master, so not sure if this is blocking).

Is there anything else I should do to test out this change?

Edit: clarifying that I ran these tests locally on MacOS 14.2.1 w/ Zulu 17.0.4

rahul-theorem commented 6 months ago

Also would you be open to relaxing the version pin further (ie. allowing all of 1.4.x, or more)? Would also be wiling to contribute some automation that will automatically bump the scikit-learn version and run a test suite in CI so this isn't a manual chore

vruusmann commented 6 months ago

I ran the tests locally, and they all pass except for org.jpmml.sklearn.testing.OutlierDetectorTest.evaluateOneClassSVMHousing (this also fails on master, so not sure if this is blocking).

The JPMML-SkLearn project is covered with GitHub Actions CI, and all unit and integration tests pass cleanly on it: https://github.com/jpmml/jpmml-sklearn/actions

Must be your local platform/OS/JVM issue. Let me guess, you're a Mac user?

rahul-theorem commented 6 months ago

Let me guess, you're a Mac user?

That's correct, is there anything else I should do to test this out?

vruusmann commented 6 months ago

Also would you be open to relaxing the version pin further (ie. allowing all of 1.4.x, or more)?

The version check is there for a reason, because the behaviour of SkLearn estimator classes changes every now and then. For example, SkLearn decision tree models started accepting missing values from 1.3.X onwards, which is currently not supported/tested by JPMML-SkLearn in any way.

There are likely many similar "enhancements" between SkLearn 1.3 and 1.4.

At minimum, one would need to rebuild all JPMML-SkLearn integration tests using whatever the latest advertised supported SkLearn version is. Today, it would mean two integration test rebuilds - first for SkLearn 1.4.0, and after that 1.4.1.

vruusmann commented 6 months ago

At minimum, one would need to rebuild all JPMML-SkLearn integration tests

Given that this is a major version upgrade (ie. from SkLearn 1.3 to 1.4), then my expectation is that there will be quite a few structural changes to SkLearn estimator classes, and there will be failing integration tests about all sorts of missing attributes, incompatible attribute value types, etc.

rahul-theorem commented 6 months ago

At minimum, one would need to rebuild all JPMML-SkLearn integration tests using whatever the latest advertised supported SkLearn version is. Today, it would mean two integration test rebuilds - first for SkLearn 1.4.0, and after that 1.4.1.

Understood, happy to run down that list once I can run the integration tests. Would you mind pointing me to how I can trigger them?

vruusmann commented 6 months ago

Would you mind pointing me to how I can trigger them?

Something like that:

cd pmml-sklearn/src/test/resources
# THIS!
python main.py

And so for every module there is!

rahul-theorem commented 6 months ago

I see. How would you like to review the change / integration tests? It doesn't look like the GitHub Actions setup will run mvn test on PRs, just after merges. I can preflight that change so it's easier to review this if that makes it easier?

Also, where is the sklearn2pmml version / python environment provided? Is there a set of python deps I should run these checks against? Since sklearn2pmml pulls in an explicit jpmml-sklearn version it's not immediately clear how I should test the python lib w/ the new (locally-built) JAR

rahul-theorem commented 6 months ago

@vruusmann this looks like this also requires sklearn2pmml changes as well to account for some sklearn internals changing -- is there a good way to test a pre-release sklearn2pmml version against a new sklearn version, along with changes in this library?

vruusmann commented 6 months ago

this looks like this also requires sklearn2pmml changes as well to account for some sklearn internals changing

Yep, see https://github.com/jpmml/sklearn2pmml/issues/409

is there a good way to test a pre-release sklearn2pmml version against a new sklearn version, along with changes in this library?

The SkLearn2PMML/JPMML-SkLearn software stack is supposed to be fairly modular and independent. The Python wrapper typically doesn't break; the current mishap regarding the (PMML)Pipeline._fit(X, y) method signature is a one-off.

Anyways, given the complexity of this upgrade, I'll do it all myself. I need to finish the current train of thought regarding the SkLearn2PMML 0.103.X development line, after which I will commence with the SkLearn 1.4-compatible 0.104.X one.

My current time estimate for shipping SkLearn2PMML version 0.104.0 is about one week from today (given that the weather conditions over here don't change much).

rahul-theorem commented 6 months ago

My current time estimate for shipping SkLearn2PMML version 0.104.0 is about one week from today (given that the weather conditions over here don't change much).

Definitely works for us, in the meantime we'll try to make progress w/ the earlier version and downgrade sklearn. Thanks @vruusmann!