Closed sebastianpinedaar closed 2 years ago
Sorry for my late reply! As of right now, the AutoLearner with method="Oboe"
will fail if the dataset is not preprocessed. This is because Oboe was designed to only select ML estimators for preprocessed datasets at that time. According to the beginning of Section 5 of the Oboe paper (https://arxiv.org/pdf/1808.03233.pdf):
Since data pre-processing is not our focus, we preprocess all datasets in the same way: one-hot encode categorical features and then standardize all features to have zero mean and unit variance. These pre-processed datasets are used in all the experiments.
We use the pre_process
function at https://github.com/udellgroup/oboe/blob/9c5b47d890bfa88ce4c67ee1450d89961d55fa9f/oboe/preprocessing.py#L12 to preprocess datasets. With a 2D numpy.ndarray
feature array X
that contains row-wise data points, we do X_preprocessed, categorical = pre_process(X, categorical, impute=True, standardize=True, one_hot_encode=True)
for a general dataset with missing entries, mixed-type and non-standard features.
Could you try preprocessing the dataset with the above function, and then using the preprocessed X_preprocessed
together with label array y
for the AutoLearner?
Closing this issue for now. Feel free to reopen.
Hi,
I wanted to use AutoLearner with method="Oboe", for the OpenML dataset=168868, but it fails because it has NaN values. Do you maybe know why?