openml / benchmark-suites

7 stars 3 forks source link

Too easy? MiceProtein #26

Closed janvanrijn closed 6 years ago

janvanrijn commented 6 years ago

As raised by @joaquinvanschoren

https://www.openml.org/t/146800

mfeurer commented 6 years ago

Here's some python code:

import sklearn.metrics
import sklearn.preprocessing
import sklearn.pipeline
import sklearn.ensemble
import openml

task = openml.tasks.get_task(146800)
X, y = task.get_X_and_y()

for i in range(10):
    train_indices, test_indices = task.get_train_test_split_indices(fold=i)
    X_train = X[train_indices]
    y_train = y[train_indices]
    X_test = X[test_indices]
    y_test = y[test_indices]
    preproc = sklearn.preprocessing.Imputer()
    tree = sklearn.ensemble.RandomForestClassifier(n_estimators=512)
    pipeline = sklearn.pipeline.Pipeline([
        ('imputer', preproc), ('tree', tree),
    ])
    pipeline.fit(X_train, y_train)

    print(sklearn.metrics.accuracy_score(y_test, pipeline.predict(X_test)))

however, at the moment I cannot find a classifier which goes up to 100% accuracy. Too bad that William does no provide hyperparameters.

janvanrijn commented 6 years ago

Judging by flow names, @WilliamRaynaut does not use the vanilla Weka converter.

janvanrijn commented 6 years ago

Closed according to Skype call 15/3/18 (@frank-hutter @giuseppec @mfeurer @janvanrijn )