openml / automlbenchmark

OpenML AutoML Benchmarking Framework
https://openml.github.io/automlbenchmark
MIT License
399 stars 132 forks source link

Upgrade Python version and dependencies #520

Closed PGijsbers closed 1 year ago

PGijsbers commented 1 year ago

Updates the core dependencies of the benchmark with the move to Py3.9.

At this point I simply moved everything forward to keep things moving. In the future, it would be good to parametrize some of these things so that it's easy to e.g., build docker containers on older ubuntu versions or have the framework python version be independent from the amlb python version.

The following table reports what has been tested, only looking at whether or not they produce a result file without errors:

Framework Ubuntu MacOS (arm) Windows (Docker)
constantpredictor
randomforest
tunedrandomforest
DecisionTree
AutoGluon ✅ *
AutoWEKA ☑️ x ☑️
GAMA
H2OAutoML x
MLNet ☑️ x [2]
MLPlan ✅/❌ x
TPOT
autosklearn [1]
flaml ✅ *
lightautoml ✅ *
mljar ✅ *
mlr3automl x x
oboe ☑️ x x
ranger x

Legend:

Verified the basic framework works with AWS (python runbenchmark.py constantpredictor -m aws with aws.docker: false runs fine).

Failures

It is not my intention to make sure all integrations work before the move, as some integrations were already broken before. That said, here is more information:

Failures:

PGijsbers commented 1 year ago

@LittleLittleCloud The MLNet integration does not work for binary classification (see this run). I have tried to debug this, but unfortunately could not get a local or containerized setup to work (libssl issues). Would you be able to have a look if the integration script needs updating to support binary classification, or if there are other issues that cause the crash? If you also have time to figure out why MLNet does not install correctly in the docker image, that would be even better (perhaps need to change the frameworks/mlnet/setup.sh script).

@chengrunyang The oboe integration currently requires some monkey patching, based on these changes proposed by alanwilter in #496. As far as I can tell, these monkey patches are required because of oboe itself (perhaps most recent version of dependencies behave different than they did at the release of oboe==0.2.0). Can you have a look at his suggestions, and either release a new version with fixes, or suggest how to use oboe in the integration script without monkey patches? This will not be a blocking issue for this PR, but if the integration script will need to continue to rely on monkey patching, then eventually we will remove oboe from the benchmark as it goes against our design principles.

LittleLittleCloud commented 1 year ago

@PGijsbers Sure, I can take a look this weekend

PGijsbers commented 1 year ago

I will hold off on merging this for now as I will be out of office (and should there be an oversight, I won't be able to fix it on short notice). Planned merge is April 24th. Even after a merge, updates to oboe and mlnet to fix the known issues would be very welcome 🙏

LittleLittleCloud commented 1 year ago

@PGijsbers I'll put that under my note. My last week was too busy to fix ml.net issue. Please ping me if I forget to fix it by the end of next week

chengrunyang commented 1 year ago

@PGijsbers Sorry for my late reply. I've been busy but will take a look at this issue this weekend.

PGijsbers commented 1 year ago

Thanks to both of you! :)

chengrunyang commented 1 year ago

@PGijsbers I looked into @alanwilter's edit of https://github.com/openml/automlbenchmark/blob/master/frameworks/oboe/exec.py at https://github.com/openml/automlbenchmark/issues/496#issuecomment-1364488930. May I know the shape of X_test here (as I'm not sure how to run this exec.py)?

I'm asking because I do see in Oboe 0.2.0 that AutoLearner.predict() fails when X_test is a 1-D numpy array that corresponds to a single data point. But in this case, I had to do something like predictions = aml.predict(X_test.reshape(1, -1)) to get the code work, which seems to be the opposite of predictions = aml.predict(X_test.squeeze()).

And, is the change from auto_learner import AutoLearner -> from oboe.auto_learner import AutoLearner related to something I'd better fix in Oboe (like, would you suggest me to use a more elegant import path)?

Btw, the first fix in https://github.com/udellgroup/oboe/compare/master...alanwilter:oboe:master is indeed needed. Thanks @alanwilter for catching the bug :)

PGijsbers commented 1 year ago

@chengrunyang

I don't know which dataset Alan was testing with that required him the squeeze. None of our datasets have only one feature or one row, and in the current working monkey patched version the tests run fine without it. I think it's safe to ignore that. As for the import statement, we also don't care which it should be.

Btw, the first fix in https://github.com/udellgroup/oboe/compare/master...alanwilter:oboe:master is indeed needed. Thanks @alanwilter for catching the bug :)

As far as I can tell, if you incorporate this fix in a new release, we can remove the monkey patch from the integration script and everything should be OK.

PGijsbers commented 1 year ago

Merging this now. When a new release of oboe is available, we'll patch the integration script in a separate PR. Similarly MLNet updates will come in a separate PR.

alanwilter commented 1 year ago

@chengrunyang

I don't know which dataset Alan was testing with that required him the squeeze. None of our datasets have only one feature or one row, and in the current working monkey patched version the tests run fine without it. I think it's safe to ignore that. As for the import statement, we also don't care which it should be.

Btw, the first fix in udellgroup/oboe@master...alanwilter:oboe:master is indeed needed. Thanks @alanwilter for catching the bug :)

As far as I can tell, if you incorporate this fix in a new release, we can remove the monkey patch from the integration script and everything should be OK.

I was working with a big private data set for testing. AFAI remember, my solution there solved my issues and didn't break the autobenchmark tests.