intel / scikit-learn-intelex

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
https://intel.github.io/scikit-learn-intelex/
Apache License 2.0
1.23k stars 175 forks source link

python -m sklearnex my_application.py VERSUS patch_sklearn() #842

Closed muhlbach closed 3 years ago

muhlbach commented 3 years ago

Hi developers and maintainers! First of all, I love this improvement and it really speeds up my programs by 100x! It's amazing.

To my question (not a bug): Say I have a program (my_application.py) that imports packages from Scikit-learn, e.g., from sklearn.ensemble import RandomForestRegressor and runs some subsequent code.

Question: What is the difference between executing the program via the Terminal by calling python -m sklearnex my_application.py OR editing the program itself, e.g. by

from sklearnex import patch_sklearn  
patch_sklearn()  
from sklearn.ensemble import RandomForestRegressor

Looking forward to hearing from you!

napetrov commented 3 years ago

@muhlbach - Thanks for your feedback =) The short answer - they are identical.

we are currently working on updating docs to get this explained more clearly - https://github.com/intel/scikit-learn-intelex/pull/838 . So we will add explicit explanation that those are interchangeable variants.

So your assessment of this doc would be important - is it clear enough assuming addition above? https://outoftardis.github.io/daal4py/what-is-patching.html

muhlbach commented 3 years ago

Yes, it's absolutely clear. Had I found that document on Google, I would not have raised the issue. Perhaps my Google-skills need a tune-up.

Follow-up questions; Are there any disadvantages in global patching? Seems like a convenient thing to do so I don't have to worry about that anymore.

napetrov commented 3 years ago

@muhlbach - i would assume that no for your case =) But for example in some cases you can't patch your sklern installation- Remote Jupyter env. Or you would like to use more granular tuning within your script.

muhlbach commented 3 years ago

@napetrov, I hope it's okay I ask a quick follow-up question.

I'm writing a package in which is use sklearn quite intensively. I have a base class and some child classes. I could add

from sklearnex import patch_sklearn
patch_sklearn()

to the top of my base class module, but that could lead to the patching being done several times when I instantiate multiple child classes. What would you recommend doing here? If patching multiple times doesn't hurt performance in any ways, I don't mind it---especially if there's a way to silence the message "Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)".

Looking forward to your answer!

napetrov commented 3 years ago

@muhlbach - there is no harm in patching multiple times.

muhlbach commented 3 years ago

Thanks for the prompt reply!

napetrov commented 3 years ago

@muhlbach - i'm looking for users to conduct short interview to get understanding on what problems you are solving currently and how our product help you there. If you have interest - please contact me via nikolay.a.petrov@intel.com. thanks!

muhlbach commented 3 years ago

I tried adding

from sklearnex import patch_sklearn
patch_sklearn()

to the top of my base class, but it gives the following error. For reproduction, run script "tests.py" from here

Traceback (most recent call last):
  File "/Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/onedal/svm/svm.py", line 35, in <module>
    from _onedal4py_dpc import (
ModuleNotFoundError: No module named '_onedal4py_dpc'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tests.py", line 34, in <module>
    from mlregression.mlreg import MLRegressor
  File "/Users/muhlbach/Repositories/mlregression/src/mlregression/mlreg.py", line 8, in <module>
    from .base.base_mlreg import BaseMLRegressor
  File "/Users/muhlbach/Repositories/mlregression/src/mlregression/base/base_mlreg.py", line 5, in <module>
    from sklearnex import patch_sklearn
  File "/Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/sklearnex/__init__.py", line 18, in <module>
    from .dispatcher import patch_sklearn
  File "/Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/sklearnex/dispatcher.py", line 26, in <module>
    from .svm import SVR as SVR_sklearnex
  File "/Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/sklearnex/svm/__init__.py", line 21, in <module>
    from .svr import SVR
  File "/Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/sklearnex/svm/svr.py", line 25, in <module>
    from onedal.svm import SVR as onedal_SVR
  File "/Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/onedal/svm/__init__.py", line 17, in <module>
    from .svm import SVC, SVR, NuSVC, NuSVR, SVMtype
  File "/Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/onedal/svm/svm.py", line 47, in <module>
    from _onedal4py_host import (
ImportError: dlopen(/Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/_onedal4py_host.cpython-38-darwin.so, 2): Symbol not found: __ZN6oneapi3dal17polynomial_kernel2v113compute_inputINS1_4task2v17computeEEC1ERKNS0_2v15tableESB_
  Referenced from: /Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/_onedal4py_host.cpython-38-darwin.so
  Expected in: /Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/../../libonedal.dylib
 in /Users/muhlbach/opt/anaconda3/envs/main/lib/python3.8/site-packages/_onedal4py_host.cpython-38-darwin.so
(main) muhlbach@Muhlbach-MacBook-Pro src %