Closed lucazav closed 1 year ago
Could you try setting model_history=True
in AutoML.fit()
? Otherwise only the best model of all the trials is kept for space efficiency.
@sonichi I'll try your hint. Anyway, I'm trying to get the best estimator in a scikit-learn type, so I supposed no history is needed.
Oh right. Then you shouldn't need that hint. Could you share a minimal code example to reproduce this issue?
Here a repo:
# %%
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from flaml import AutoML
import pickle
import os
# %%
main_path = r'C:\<your-path>'
# %%
dataset = pd.read_csv(os.path.join(main_path, 'titanic-imputed.csv'))
dataset
# %%
# Let's split the dataframe in a small part to be kept for test purpose and
# a large part for training.
X = dataset.drop('Survived',axis=1)
y = dataset[['Survived']]
# Force the float values of Pclass to integer, as Power BI imports it as an int column
X['Pclass'] = X['Pclass'].astype('int')
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.05)
# %%
# Setup the FLAML AutoML experiment properly
automl = AutoML()
settings = {
"time_budget": 600, # total running time in seconds
"metric": 'roc_auc', # check the documentation for options of metrics (https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#optimization-metric)
"task": 'classification', # task type
"log_file_name": 'titanic.log', # flaml log file
"seed": 7654321, # random seed
}
# Get a Pandas series from the single column y_train datarame,
# as automl.fit requires a series for its y_train parameter
y_train_series = y_train.squeeze()
automl.fit(X_train=X_train, y_train=y_train_series, **settings)
# %%
'''retrieve best config and best learner'''
print('Best ML leaner:', automl.best_estimator)
print('Best AUC on validation data: {0:.4g}'.format(1-automl.best_loss))
# %%
best_estimator = automl.best_model_for_estimator(automl.best_estimator).estimator
type(best_estimator)
You can find the CSV file used as training dataset here:
I can't download the CSV file but I think I know the issue. Please use automl.model.estimator
to get the best model's estimator. The other way you are using requires model_histor=True
.
Thank you @sonichi, automl.model.estimator
is what I was looking for. Maybe a clear documentation about all this stuff could be really useful to the user.
I trained a classificator using AutoML. Then I run this code to get the best estimator:
best_estimator = model.best_model_for_estimator(model.best_estimator)
I noticed that this estimator is of
flaml.automl.model.LGBMEstimator
type. I expected a scikit-learn custom estimator. As I need a scikit-learn estimator as output, I tried this way:best_estimator.estimator
but I get a
NoneType
object.Any hint, please? I'm using FLAML 2.0.0