Closed strelzoff-personal closed 1 month ago
These are toy datasets. Try with a real dataset.
The algorithm is early stopping without overfitting.
We are working on automl benchmark. https://github.com/openml/automlbenchmark
This benchmark also contains small datasets. We will improve the algorithm for small / toy datasets also with some minor tweaks.
import time
import pandas as pd
from sklearn.datasets import load_breast_cancer, load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, log_loss
from perpetual import PerpetualBooster
def evaluate(model, X_train, y_train, X_test, y_test, budget=None):
start = time.time()
model.fit(X_train, y_train, budget=budget) if budget else model.fit(X_train, y_train)
duration = time.time() - start
return duration, accuracy_score(y_test, model.predict(X_test)), log_loss(y_test, model.predict_proba(X_test))
datasets = {"Breast Cancer": load_breast_cancer(return_X_y=True), "Binary Iris": (load_iris(return_X_y=True)[0][load_iris().target!=2], load_iris(return_X_y=True)[1][load_iris().target!=2])}
results = pd.DataFrame(columns=["Dataset", "Model", "Budget", "Time", "Accuracy", "Log Loss"])
for name, (X, y) in datasets.items():
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
pb = PerpetualBooster(objective="LogLoss")
rf = RandomForestClassifier()
results = pd.concat([results,
pd.DataFrame([[name, "Perpetual", "0.1", *evaluate(pb, X_train, y_train, X_test, y_test, budget=0.1)]], columns=results.columns),
pd.DataFrame([[name, "Perpetual", "1.0", *evaluate(pb, X_train, y_train, X_test, y_test, budget=1.0)]], columns=results.columns),
pd.DataFrame([[name, "Perpetual", "2.0", *evaluate(pb, X_train, y_train, X_test, y_test, budget=2.0)]], columns=results.columns),
pd.DataFrame([[name, "RF", "-", *evaluate(rf, X_train, y_train, X_test, y_test)]], columns=results.columns),
],
ignore_index=True)
v0.5.0
is released with the fix.
Results:
Dataset | Model | Budget | Time | Accuracy | Log Loss | |
---|---|---|---|---|---|---|
0 | Breast Cancer | Perpetual | 0.1 | 149.592 | 0.973684 | 0.158678 |
1 | Breast Cancer | Perpetual | 1.0 | 129.906 | 0.973684 | 0.123220 |
2 | Breast Cancer | Perpetual | 2.0 | 155.879 | 0.973684 | 0.099885 |
3 | Breast Cancer | RF | - | 0.522181 | 0.964912 | 0.103776 |
4 | Binary Iris | Perpetual | 0.1 | 0.335295 | 1.000000 | 0.000032 |
5 | Binary Iris | Perpetual | 1.0 | 0.378495 | 1.000000 | 0.000273 |
6 | Binary Iris | Perpetual | 2.0 | 0.334572 | 1.000000 | 0.004814 |
7 | Binary Iris | RF | - | 0.305424 | 1.000000 | 0.002518 |