No speed up when using patch_sklearn with PyCaret

moezali1 commented 2 years ago

Hi,

We are planning to integrate scikit-learn-intelex project with PyCaret

The issue is as following:

from pycaret.datasets import get_data
data = get_data('poker')

from pycaret.classification import *
s = setup(data, target = 'CLASS', session_id = 123)

from sklearnex import patch_sklearn
patch_sklearn()

%%time
knn = create_model('knn')

This took 10 minute despite of patch_sklearn command.

However when I explicitly import the model it gives great results from acceleration:

from pycaret.datasets import get_data
data = get_data('poker')

from pycaret.classification import *
s = setup(data, target = 'CLASS', session_id = 123)

from sklearnex import patch_sklearn
patch_sklearn()

from sklearn.neighbors import KNeighborsClassifier
knn_intel = KNeighborsClassifier()

%%time
knn_intel = create_model(knn_intel)

Expected Action:

What can we do to make the first attempt gives result of acceleration so that users won't have to import estimator explicitly. I am thinking we can add a parameter in the setup function called use_intel_acceleration. When that is set to True by user we should run patch_sklearn command in our code base so that users won't have to do any thing explicitly outside of PyCaret code base.

The file which creates model container in PyCaret repo are located here.

PivovarA commented 2 years ago

Hi @moezali1 Thanks for reporting the issue. The problem is that patching must be applied before calling sklearn. patching replaces some methods from sklearn with their optimizions from the intel extension for scikit-learn. In your case, you are importing pycaret first, where sklearn is already imported. However, I found that if you call the patch before all other code, your code that you have attached will not work. This is not expected behavior. The reason for this is a bug on our side, which is related to train_test_split. I already created an issue I already have a fix and I will create a PR with it in the near future. After the fix everything seems to work as expected:

We also think that the patching option may not be the easiest for integration and perhaps the best option would be to import the required methods directly from sklearnex. The autogluon team integrated sklearnex in a similar way.

PivovarA commented 2 years ago

@moezali1 Also like an option you can use patching for some algorithms. For example this line will work for your code:

from sklearnex import patch_sklearn
patch_sklearn("knn_classifier")

from pycaret.datasets import get_data
data = get_data('poker')

from pycaret.classification import *
s = setup(data, target = 'CLASS', session_id = 126)

To get map of algorithms:

import daal4py as d4p
d4p.sklearn.monkeypatch.dispatcher._get_map_of_algorithms().keys()

>>
dict_keys(['pca', 'kmeans', 'dbscan', 'distances', 'linear', 'ridge', 'elasticnet', 'lasso', 'svm', 'logistic', 'log_reg', 'knn_classifier', 'nearest_neighbors', 'knn_regressor', 'random_forest_classifier', 'random_forest_regressor', 'train_test_split', 'fin_check', 'roc_auc_score', 'tsne', 'svc', 'logisticregression', 'kneighborsclassifier', 'nearestneighbors', 'kneighborsregressor', 'randomrorestclassifier', 'randomforestregressor'])

I can also help you create a PR for integration sklearnex into pycaret.

moezali1 commented 2 years ago

@PivovarA Thanks for your detailed response. I look forward to receiving your PR on pycaret. Excited for this integration.

moezali1 commented 2 years ago

@PivovarA We are releasing 3.0-rc in 2 weeks. Do you think its possible to send a PR our way on the develop branch before May 15th?

Thanks.

intel / scikit-learn-intelex

No speed up when using patch_sklearn with PyCaret #996

Expected Action: