Closed napetrov closed 1 year ago
@napetrov I am currently making the release compatible with the latest change in scikit-learn 1.3.
My vision here would be to allow to use scikit-learn-intelx
. If one explicitly activates the package, then imbalanced-learn
can use it internally.
From what I recall, if a user just patches scikit-learn:
from sklearnex import patch_sklearn
patch_sklearn()
Then, all the import scikit-learn imports will then use the Intel versions.
So in the end, I don't think that we need to change anything in the codebase, isn't it?
What we would need is to have some documentation in the installation process and potentially have a CI run to be sure that the tests are passing with the latest scikit-learn-intelx
.
@glemaitre - yes, correct. Documentation and CI would be good base steps. And patch() is most basic one - other alternatives would be to pass algorithm objects in to imblearn.
from imblearn.under_sampling import EditedNearestNeighbours
**from sklearnex.neighbors** import NearestNeighbors
...
nn = NearestNeighbors(n_neighbors=4, n_jobs=-1)
X_resampled, y_resampled = EditedNearestNeighbours(n_neighbors=nn).fit_resample(X, y)
It worth mentioning both in documentation and explain difference - with patch() call you would apply this for all scikit calls in script, while with direct exports you can do this for imblearn only.
Can start initial doc input if this would help:
Or other recommendations/suggestions are welcome.
As for code changes - this is an option for a more granular control within imbalanced learn itself. For example we have this with PyCaret and AutoGluon - frameworks themself are aware of scikit-learn-intelex package they are using from sklearnex import instead of from sklearn imports in case they detect sklearnex package in environment, but dependency is not enforced in base install, only in full optional deps. So this gives ability to use relevant pieces more consciously.
frameworks themself are aware of scikit-learn-intelex package they are using from sklearnex import instead of from sklearn imports in case they detect sklearnex package in environment, but dependency is not enforced in base install, only in full optional deps. So this gives ability to use relevant pieces more consciously.
I prefer to have an explicit way of indicating that you want to use sklearnex
. Internally, I don't think that there is a huge drawback to not having granular control. I am more worried about making a magical choice for the user. While working in scikit-learn issue tracker, we saw already a couple of bugs reported where the user, even by being explicit with the patching, does not get the source of the bugs.
Since I am already struggling to maintain this package, I would not go on the road of making automatic backend switch.
I added a section in the documentation install guideline.
There was resent experiments on accelerating imbalanced-learn library with scikit-learn-intelex estimators and results are quite promising - https://medium.com/intel-analytics-software/why-pay-more-for-machine-learning-893683bd78e4
So we are talking about pretty noticeable speedups up to 140 times that would benefit imbalance-learn users. What are your thought on providing more tight integration?
There are multiple options that can be used here:
Verry open for integration options discussion and would be happy to address questions/concerns or suggestions here.