Open marcoslbueno opened 2 years ago
Thanks for raising the issue! This error stems from the assumption that since Dataframes
provide type annotation (their dtype
), GAMA expects this to be correct (use unannotated numpy
otherwise). By providing an explicitly non-categorical feature (technically object
), you go against this assumption. This raises an error (although a bad and late one (#132)) because GAMA can't work with an object
type series.
If you want feature type inference consider passing the data in numpy format:
- clf.fit(X_train, y_train)
+ clf.fit(X_train.values, y_train.values)
- proba_predictions = clf.predict_proba(X_test)
+ proba_predictions = clf.predict_proba(X_test.values)
By design I think it is good to assume that the user is an expert on the data: they can help the AutoML system with data type annotation. However, expanding the interface to allow for inferring pandas object
series if explicitly set (e.g. infer_objects=True
) sound reasonable to me. What do you think?
Thanks for replying! Indeed by using your suggestion GAMA was able to finish without errors.
I think that adding a parameter like infer_objects=True
makes a lot of sense, since the user might be unsure about the column types of the dataset (even when using dataframes) and/or do not want to be checking this.
I am using a classification dataset with a mixture of string and category features in a pandas dataframe, and this breaks down GAMA (see MRE below).
The error I get is
The problem is solved when I convert the string features (in this case, 0 and 22) to category. I would think it would be best if GAMA could do this automatically, since it is an apparently simple conversion.