Information on how these datasets were chosen

mljar / automl_comparison

Comparison of automatic machine learning libraries

Apache License 2.0

26 stars 7 forks source link

Information on how these datasets were chosen #2

Open ledell opened 3 years ago

ledell commented 3 years ago

Hi there,

I realize this benchmark is a few years old now, but can you explain how these datasets from OpenML were selected for this benchmark? If they were not randomly selected (using a seed, sampling from OpenML ids), then it would be good to know how/why each dataset was chosen to be included in the benchmark. Thanks!

pplonski commented 3 years ago

Hey @ledell! Good question, I've probably taken them from one of the Frank Hutter articles (right now I don't remember which one, probably about auto-sklearn).

ledell commented 3 years ago

Ok, thanks for the info! I'll take a look at the paper and see if they match up.