scikit-learn-contrib / scikit-learn-extra

scikit-learn contrib estimators
https://scikit-learn-extra.readthedocs.io
BSD 3-Clause "New" or "Revised" License
188 stars 43 forks source link

Fast Kernel Classifier not always fast #34

Open amueller opened 5 years ago

amueller commented 5 years ago

I'm looking at the electricity dataset https://www.openml.org/d/151 as a benchmark for Fast Kernel Classifier. X.shape = (45312, 8) which I feel should be a good candidate for this model to be fast.

However, SVC is about 10% or 20% faster.

from sklearn.datasets import fetch_openml
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import numpy as np

data = fetch_openml(data_id = 151)
X = pd.DataFrame(data.data)
y = pd.Series(data.target)

svc = SVC(kernel = 'rbf', gamma='scale',
          random_state= 1)

svc_cv = cross_validate(make_pipeline(StandardScaler(), svc), X, y, cv = 10)

print('Fit time \t', np.mean(svc_cv['fit_time']))
print('Score \t\t', np.mean(svc_cv['test_score']))

from sklearn_extra.kernel_methods import EigenProClassifier as FKCEigenPro

epc = FKCEigenPro(kernel = 'rbf', gamma='scale',
          random_state= 1)

epc_cv = cross_validate(make_pipeline(StandardScaler(), epc), X, y, cv = 10)
print('Fit time \t', np.mean(epc_cv['fit_time']))
print('Score \t\t', np.mean(epc_cv['test_score']))

Any idea what's going on here? The time is basically exclusively spent in the _kernel method.

cc @Alex7Li

amueller commented 5 years ago

gamma = scale is ignored, see #35

amueller commented 5 years ago

Running the same script with this dataset: https://www.openml.org/d/4534 "PhishingWebsites" is even a bigger difference, it seems; FKC takes 10 seconds, SVC takes 2.