openml / automlbenchmark

OpenML AutoML Benchmarking Framework
https://openml.github.io/automlbenchmark
MIT License
399 stars 132 forks source link

Only list bool as numerical if OpenML has it listed as numerical #556

Closed PGijsbers closed 1 year ago

PGijsbers commented 1 year ago

If the arff header contains a "categorical boolean" i.e. a nominal attribute with possible values {true, false} (or similar) then openml-python will convert it into bool when loading it into a dataframe. This in turn made AMLB write it to the split arff files as numeric, which could result in issues for frameworks relying on the split arff files produces by the benchmark (e.g., h2oautoml) especially when it was the target column. In the benchmark these are: kc1 (openml/t/3917), pc4 (openml/t/359958), and miniboone (openml/t/359990).