Open lesteve opened 2 weeks ago
Please check my pull request i have resolved this bug
So I debugged this a bit more and the summary is:
Parallel
use loky
as backend (as expected) and the inner Parallel
use threading
, which I did not expect. I thought (maybe naively) that it would always be loky.RFECV(n_jobs=2).fit
with threading
backend gives the same kind of errors see snippet below. I guess this is also why I have seen the same errors when switching the default joblib backend to threading
.from sklearn.datasets import make_classification
from sklearn.feature_selection import RFECV
from sklearn.linear_model import LogisticRegression
from joblib import parallel_config
X, y = make_classification()
clf = LogisticRegression()
rfecv = RFECV(
estimator=clf,
min_features_to_select=1,
n_jobs=2,
)
with parallel_config(backend="threading"):
rfecv.fit(X, y)
You can get two types of errors:
or
I don't quite understand what is happening yet but it seems like there is a side-effect somewhere I would have thought that the inner parallelism would do copy but apparently not. Using
clone
in https://github.com/scikit-learn/scikit-learn/blob/e04142cbe0f4f854272f877eb9692053b0a6bcf8/sklearn/feature_selection/_rfe.py#L886-L889seems to fix it:
This was seen in https://github.com/scikit-learn/scikit-learn/pull/29614 (and also in private testing of CPython 3.13 free-threaded with default joblib backend set to threading but I thought it was threading related).