keras-team / autokeras

AutoML library for deep learning
http://autokeras.com/
Apache License 2.0
9.13k stars 1.4k forks source link

model = clf.export_model() is not best model #1747

Closed xuean009 closed 2 years ago

xuean009 commented 2 years ago
import numpy as np
import pandas as pd
import tensorflow as tf
import autokeras as ak

x_train = pd.read_csv("train.csv")
print(type(x_train))  # pandas.DataFrame
y_train = x_train.pop("result")
print(type(y_train))  # pandas.Series

y_train = pd.DataFrame(y_train)
print(type(y_train))  # pandas.DataFrame

x_train = x_train.to_numpy()
y_train = y_train.to_numpy()
print(type(x_train))  # numpy.ndarray
print(type(y_train))  # numpy.ndarray

x_test = pd.read_csv("test.csv")
y_test = x_test.pop("result")

clf = ak.StructuredDataClassifier(overwrite=True, max_trials=100)

clf.fit(x_train, y_train, epochs=20)

predicted_y = clf.predict(x_test)

print(clf.evaluate(x_test, y_test))

model = clf.export_model()
model.save("model_autokeras.h5")
from tensorflow.keras.models import load_model
loaded_model = load_model("model_autokeras.h5",custom_objects=ak.CUSTOM_OBJECTS)

print(type(x_train))  # numpy.ndarray
print(type(x_test))  # numpy.ndarray
result = loaded_model.predict(x_test)
result = np.argmax(result, axis=1)
print(result)
-----------------------------------------------
Best val_accuracy So Far: 0.9523809552192688
but,model = clf.export_model()   test accuracy only 0.800000011920929
xuean009 commented 2 years ago

model = clf.export_model() 输出的model感觉是最后一次训练的model,不是best model please help me,thank you ,谢谢

AngelaChang119 commented 2 years ago

I have same problem

AngelaChang119 commented 2 years ago

It stops whenever maximum number of epochs are reached in just 1 case and it out put the last model and last epoch

xuean009 commented 2 years ago

I have same problem

Have you solved it ?

haifeng-jin commented 2 years ago

Fit consists of 2 steps, search and final fit. The "last trained model" is the best model during search and final fit with your entire training set. During the search it was only trained with a split of the data you provided.

So the difference in accuracy is mainly because it is using different dataset to evaluate.

If you use clf.fit(x_train, y_train, validation_data=(x_test, y_test) and clf.evaluate(x_test, y_test), the accuracy should be the same.

AngelaChang119 commented 2 years ago

I have same problem

Have you solved it ?

solved by manually add keras.callbacks.EarlyStopping in clf.fit

qrdlgit commented 1 year ago

Fit consists of 2 steps, search and final fit. The "last trained model" is the best model during search and final fit with your entire training set. During the search it was only trained with a split of the data you provided.

So the difference in accuracy is mainly because it is using different dataset to evaluate.

If you use clf.fit(x_train, y_train, validation_data=(x_test, y_test) and clf.evaluate(x_test, y_test), the accuracy should be the same.

Are you sure? I think it's using best loss rather than best val_ metric.