openml-labs / gama

An automated machine learning tool aimed to facilitate AutoML research.
https://openml-labs.github.io/gama/master/
Apache License 2.0
96 stars 31 forks source link

Using `numpy` arrays as data source may lead to errors if inferred encoding is used #193

Open PGijsbers opened 1 year ago

PGijsbers commented 1 year ago
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=0
)

# Add checks on individuals (reproducibility)
gama.fit(X_train, y_train)

GAMA infers some features as categoricals (which is expected behavior, though incorrect). This in turn creates new feature names, now some are int and some are str, e.g.: ['1_1', '1_2', 2, 3, ...] This results in an error during evaluation: <class 'TypeError'> Feature names are only supported if all input features have string name.

Postponing on fixing this until #169 is merged.

For people encountering issues with this behavior, please use pandas dataframes for now.