About all the dataset used in AMLB

openml / automlbenchmark

OpenML AutoML Benchmarking Framework

https://openml.github.io/automlbenchmark

MIT License

391 stars 130 forks source link

About all the dataset used in AMLB #509

Closed xieleo5 closed 1 year ago

xieleo5 commented 1 year ago

Hi, the abstract part of the paper "AMLB: an AutoML Benchmark" says that there are 71 classification and 33 regression tasks in total. But I only find about 50 task_ids in total in resources/benchmarks. I'd like to know where can I find the other tasks. Are those tasks also OpenML tasks?

PGijsbers commented 1 year ago

This is the list of classification tasks, and this is the list of regression tasks. With runbenchmark.py you can refer to them as openml/s/271 and openml/s/269 respectively. The files in resources/benchmarks are ways to define custom benchmark sets that are not (necessarily) on OpenML. We also use these to easily execute some subsets that may identify specific compatibility or integration issues.

tim20120526 commented 7 months ago

This is the list of classification tasks, and this is the list of regression tasks. With runbenchmark.py you can refer to them as openml/s/271 and openml/s/269 respectively. The files in resources/benchmarks are ways to define custom benchmark sets that are not (necessarily) on OpenML. We also use these to easily execute some subsets that may identify specific compatibility or integration issues.

thanks a lot. the datasets from the links you provide just 66 cls task and 33 reg task. why not 104 task?

PGijsbers commented 7 months ago

It's a bug in the front-end, if you programmatically access the benchmarking suites you will get the correct amount. Using openml-python:

>>> import openml
>>> len(openml.study.get_suite(271).tasks)
71
>>> len(openml.study.get_suite(269).tasks)
33