Open pplonski opened 3 months ago
Related to #722
Data set used: https://www.kaggle.com/competitions/playground-series-s4e8
Code:
import pandas as pd
from supervised import AutoML
# load fraction of data for speed up
train = pd.read_csv("playground-series-s4e8/train.csv").sample(frac=0.1, random_state=123)
print(train.head())
x_cols = train.columns[2:]
y_col = train.columns[1]
print(x_cols, y_col)
model = AutoML(eval_metric="accuracy", total_time_limit=3600, mode="Compete", kmeans_features=False)
model.fit(train[x_cols], train[y_col])
test = pd.read_csv("playground-series-s4e8/test.csv")
y_predicted = model.predict(test[x_cols])
submission = pd.read_csv("playground-series-s4e8/sample_submission.csv")
submission["class"] = y_predicted
submission.to_csv("baseline_m_2.csv", index=False)
here is log from training