openml / automlbenchmark

OpenML AutoML Benchmarking Framework
https://openml.github.io/automlbenchmark
MIT License
391 stars 130 forks source link

https://download.mlplan.org/ does not have mlplan.zip #497

Closed alanwilter closed 1 year ago

alanwilter commented 1 year ago

The discussion started in #463 but MlPlan is still failing.

Should it be this file (see https://download.mlplan.org/)?

Github actions is failing at this mlplanweka task.

PGijsbers commented 1 year ago

@fmohr is alan correct here? Has the file changed? Which MLPlan version is inside? More generally, do the AIlib versions reflect MLPlan version, or can different AIlib versions contain the same MLPlan version? From a usability/reproducibility standpoint we need to know what we are installing and from where. It would also be great if the old files would remain downloadable even when new versions are released.

fmohr commented 1 year ago

Hi @alanwilter and @PGijsbers . sorry for the delayed reply. In fact, there was an issue on our server due to an update of apache. The mlplan.zip is now there, and the build should work. Could you please confirm this?

@PGijsbers : MLPlan is not part of AILibs but just uses it as a dependency. Or did I miss your question?

The old files remain on our server. mlplan.zip always points to the newest version, which is currently mlplan-0.2.3.zip.

alanwilter commented 1 year ago

Thanks @fmohr, the download now works after I added this change in frameworks/MLPlan/setup.sh:

- wget -q https://download.mlplan.org/version/$VERSION -O $DOWNLOAD_DIR/$MLPLAN_ARC
+ wget -q --no-check-certificate https://download.mlplan.org/version/$VERSION -O 

Now @PGijsbers, it's the java jar that's failing with (see full log here):

...
ERROR [main] [ai.libs.mlplan.cli.MLPlanCLI.generateCommandLine (ai.libs.mlplan.cli.MLPlanCLI:174)] - ERROR: Unable to parse command-line arguments [-f, /home/runner/.cache/openml/org/openml/www/datasets/61/dataset_train_0.arff, -p, /home/runner/.cache/openml/org/openml/www/datasets/61/dataset_test_0.arff, -t, 60, -ncpus, 2, -l, LOGLOSS, -m, weka, -s, 985826994, -ooab, /home/runner/work/automlbenchmark/automlbenchmark/results/mlplanweka.test.test.local.20220926T202513/mlplan_out/iris/0/predictions.csv, -os, /home/runner/work/automlbenchmark/automlbenchmark/results/mlplanweka.test.test.local.20220926T202513/mlplan_out/iris/0/statistics.json, -tmp, /tmp/tmpzrfbcjbk] due to exception.
org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -os
...
fmohr commented 1 year ago

Hi, looking at the detailed logs, I believe that's still something on our side. Apparently an unsupported option in the main routine of the CLI. @mwever could you confirm on this one pls?

I would imagine that this has already been solved in a newer version of the MLPlan CLI, and we just need to update this on the download page.

alanwilter commented 1 year ago

The file downloaded is mlplan-0.2.3.zip and its content is mlplan-cli-0.0.1.jar. Is that expected?

fmohr commented 1 year ago

Yes that's expected. 0.2.3 is the version of MLPlan itself, 0.0.1 is the version of the CLI (a bit confusing, admittedly).

alanwilter commented 1 year ago
$ java -jar mlplan-cli-0.0.1.jar -h                                                                                                                                                                                       Called ML-Plan CLI with the following params: >[-h]<
usage: java -jar <mlplan.jar>
ML-Plan CLI v0.0.1-alpha
================================
 -f,--datasetFit <arg>                             The dataset file in ARFF format used for searching an appropriate pipeline.
 -h,--help                                         Provides an overview of available parameters and their description.
 -l,--evaluationMeasure <arg>                      The loss function to be used for internally assessing a candidate's performance.
                                                   Note that loss functions are problem specific, i.e. for single-label classification, regression or multi-label classification allow for different loss functions respectively. Score functions are automatically transformed into a loss function.
                                                   Depending on the chosen module, the following options are available to be configured.
                                                   weka: AUC, F1, PRECISION, RECALL, ERRORRATE, LOGLOSS
                                                   sklearn-rul: ASYMMETRIC_LOSS, MEAN_ABSOLUTE_ERROR, MEAN_ABSOLUTE_PERCENTAGE_ERROR, MEAN_SQUARED_ERROR, ROOT_MEAN_SQUARED_ERROR, R2, ROOT_MEAN_SQUARED_LOGARITHM_ERROR
                                                   weka-regression: ASYMMETRIC_LOSS, MEAN_ABSOLUTE_ERROR, MEAN_ABSOLUTE_PERCENTAGE_ERROR, MEAN_SQUARED_ERROR, ROOT_MEAN_SQUARED_ERROR, R2, ROOT_MEAN_SQUARED_LOGARITHM_ERROR
                                                   sklearn-regression: ASYMMETRIC_LOSS, MEAN_ABSOLUTE_ERROR, MEAN_ABSOLUTE_PERCENTAGE_ERROR, MEAN_SQUARED_ERROR, ROOT_MEAN_SQUARED_ERROR, R2, ROOT_MEAN_SQUARED_LOGARITHM_ERROR
                                                   sklearn: AUC, F1, PRECISION, RECALL, ERRORRATE, LOGLOSS
                                                   sklearn-unlimited: AUC, F1, PRECISION, RECALL, ERRORRATE, LOGLOSS
                                                   weka-tiny: AUC, F1, PRECISION, RECALL, ERRORRATE, LOGLOSS
 -m,--module <arg>                                 The ML-Plan module to be used.(Default: weka)
                                                   weka, sklearn-rul, weka-regression, sklearn-regression, sklearn, sklearn-unlimited, weka-tiny
 -ncpus,--numCPUs <arg>                            The number of CPU cores to be used by ML-Plan.(Default: 4)
 -ooab,--outputOpenmlAutomlBenchmarkResult <arg>   Output the result of the AutoML process according to the OpenML AutoML Benchmark suite. Enabling this option requires a dataset for predicting to be provided.
 -openMLTask,--openMLTask <arg>                    The OpenML task and the fold to run ML-Plan on.
 -p,--datasetPredict <arg>                         The dataset file in arff format used for applying the found pipeline to.
 -pci,--positiveClassIndex <arg>                   The index of the class to be considered as positive (for asymmetric evaluation measures).(Default: 0)
 -pcn,--positiveClassName <arg>                    Path to a custom search space configuration file.
 -s,--seed <arg>                                   The randomness seed used for the pseudo random number generator.(Default: 42)
 -ssc,--searchSpaceConfiguraitonFile <arg>         Path to a custom search space configuration file.
 -t,--timeout <arg>                                The timeout for the entire ML-Plan run in seconds.(Default: 3600)
 -tc,--candidateTimeout <arg>                      The timeout for evaluating a single candidate in seconds.(Default: 300)
 -tn,--nodeEvaluationTimeout <arg>                 The timeout for evaluating a node in the search tree (in seconds), i.e., the timeout for all the random completions drawn below the current node. This timeout is usually set to the candidate timeout times the number of random completions. *Note*: The default is automatically adapted if this option is not set but the candidate evaluation timeout is configured.(Default: 900)
 -v,--visualize                                    Enable a visualization of the search tree of ML-Plan together with a live stream of observations.
===============================
Visit us at: https://mlplan.org

AMLB yet asks for -os and -tmp parameters, which I don't see above. Clearly frameworks/MLPlan/exec.py needs this -os (statistics_file = os.path.join(mlplan_output_dir, 'statistics.json')).

PGijsbers commented 1 year ago

@PGijsbers : MLPlan is not part of AILibs but just uses it as a dependency. Or did I miss your question?

I believe at the time I checked the only available download was AILibs which made me think that perhaps ML-Plan was absorbed into that package. It's clear now that they remain separate and it was just an issue with available files 👍 thanks for sorting this out. (I'll stay on the sideline for this issue as long as it looks like the issue is with ML-Plan itself)

fmohr commented 1 year ago

@alanwilter, @PGijsbers , we now have the updated version in. At least on our end it also worked. Could you please confirm this?

alanwilter commented 1 year ago

I did work for AMLB git actions, see full log here: https://github.com/openml/automlbenchmark/actions/runs/3145334048/jobs/5112508133

despite some error messages.

Mine dataset showed similar errors but failed to give a result. I will investigate my case but I think the java jar is working now.

alanwilter commented 1 year ago

Rerun my case and now it worked. I think this ticket can be close now. Many thanks!

fmohr commented 1 year ago

Great, thanks @alanwilter for the constructive feedback!

PGijsbers commented 1 year ago

Thanks @alanwilter and @fmohr!