Closed rahul-theorem closed 6 months ago
@vruusmann I ran the tests locally, and they all pass except for org.jpmml.sklearn.testing.OutlierDetectorTest.evaluateOneClassSVMHousing
(this also fails on master
, so not sure if this is blocking).
Is there anything else I should do to test out this change?
Edit: clarifying that I ran these tests locally on MacOS 14.2.1 w/ Zulu 17.0.4
Also would you be open to relaxing the version pin further (ie. allowing all of 1.4.x
, or more)? Would also be wiling to contribute some automation that will automatically bump the scikit-learn version and run a test suite in CI so this isn't a manual chore
I ran the tests locally, and they all pass except for org.jpmml.sklearn.testing.OutlierDetectorTest.evaluateOneClassSVMHousing (this also fails on master, so not sure if this is blocking).
The JPMML-SkLearn project is covered with GitHub Actions CI, and all unit and integration tests pass cleanly on it: https://github.com/jpmml/jpmml-sklearn/actions
Must be your local platform/OS/JVM issue. Let me guess, you're a Mac user?
Let me guess, you're a Mac user?
That's correct, is there anything else I should do to test this out?
Also would you be open to relaxing the version pin further (ie. allowing all of 1.4.x, or more)?
The version check is there for a reason, because the behaviour of SkLearn estimator classes changes every now and then. For example, SkLearn decision tree models started accepting missing values from 1.3.X onwards, which is currently not supported/tested by JPMML-SkLearn in any way.
There are likely many similar "enhancements" between SkLearn 1.3 and 1.4.
At minimum, one would need to rebuild all JPMML-SkLearn integration tests using whatever the latest advertised supported SkLearn version is. Today, it would mean two integration test rebuilds - first for SkLearn 1.4.0, and after that 1.4.1.
At minimum, one would need to rebuild all JPMML-SkLearn integration tests
Given that this is a major version upgrade (ie. from SkLearn 1.3 to 1.4), then my expectation is that there will be quite a few structural changes to SkLearn estimator classes, and there will be failing integration tests about all sorts of missing attributes, incompatible attribute value types, etc.
At minimum, one would need to rebuild all JPMML-SkLearn integration tests using whatever the latest advertised supported SkLearn version is. Today, it would mean two integration test rebuilds - first for SkLearn 1.4.0, and after that 1.4.1.
Understood, happy to run down that list once I can run the integration tests. Would you mind pointing me to how I can trigger them?
Would you mind pointing me to how I can trigger them?
Something like that:
cd pmml-sklearn/src/test/resources
# THIS!
python main.py
And so for every module there is!
I see. How would you like to review the change / integration tests? It doesn't look like the GitHub Actions setup will run mvn test
on PRs, just after merges. I can preflight that change so it's easier to review this if that makes it easier?
Also, where is the sklearn2pmml
version / python environment provided? Is there a set of python deps I should run these checks against? Since sklearn2pmml
pulls in an explicit jpmml-sklearn
version it's not immediately clear how I should test the python lib w/ the new (locally-built) JAR
@vruusmann this looks like this also requires sklearn2pmml
changes as well to account for some sklearn internals changing -- is there a good way to test a pre-release sklearn2pmml
version against a new sklearn version, along with changes in this library?
this looks like this also requires sklearn2pmml changes as well to account for some sklearn internals changing
Yep, see https://github.com/jpmml/sklearn2pmml/issues/409
is there a good way to test a pre-release sklearn2pmml version against a new sklearn version, along with changes in this library?
The SkLearn2PMML/JPMML-SkLearn software stack is supposed to be fairly modular and independent. The Python wrapper typically doesn't break; the current mishap regarding the (PMML)Pipeline._fit(X, y)
method signature is a one-off.
Anyways, given the complexity of this upgrade, I'll do it all myself. I need to finish the current train of thought regarding the SkLearn2PMML 0.103.X development line, after which I will commence with the SkLearn 1.4-compatible 0.104.X one.
My current time estimate for shipping SkLearn2PMML version 0.104.0 is about one week from today (given that the weather conditions over here don't change much).
My current time estimate for shipping SkLearn2PMML version 0.104.0 is about one week from today (given that the weather conditions over here don't change much).
Definitely works for us, in the meantime we'll try to make progress w/ the earlier version and downgrade sklearn. Thanks @vruusmann!
Closes https://github.com/jpmml/jpmml-sklearn/issues/195
Bumps the max supported scikit-learn range to 1.4.1.post1