ping @glemaitre what do you think could be the issue here? sklearn's pipeline works just fine without the error (if i drop the oversampler instance)
I cannot reproduce the error in a clean conda environment with the version specified. Could you reinstall imbalanced-learn. I don't see why the error should occur indeed.
@zoj613 are you sure that the code you are executing is the one you provided? It works on my machine. A similar test case it already exists in our test suite.
@glemaitre @chkoar I use poetry for package management and have set imbalance-learn to the version in current master branch. I just updated the packages and reran the code, I get the same error in my pyenv environment. This is the input/output on ipython:
In [3]: from sklearn.datasets import make_classification
In [4]: from imblearn.pipeline import Pipeline
...: from imblearn.over_sampling import RandomOverSampler
...: from sklearn.preprocessing import StandardScaler
...: from sklearn.linear_model import SGDClassifier
...: X, y = make_classification()
...: steps = [
...: ('scaler', StandardScaler()),
...: ('sampler', RandomOverSampler()),
...: ('clf', SGDClassifier())
...: ]
...: p = Pipeline(steps, memory='./data/')
...:, y)
UnboundLocalError Traceback (most recent call last)
<ipython-input-4-5effb3d2fca4> in <module>
11 ]
12 p = Pipeline(steps, memory='./data/')
---> 13, y)
~/.pyenv/versions/3.6.8/envs/absa-py36/lib/python3.6/site-packages/imblearn/ in fit(self, X, y, **fit_params)
286 """
--> 287 Xt, yt, fit_params = self._fit(X, y, **fit_params)
288 with _print_elapsed_time('Pipeline',
289 self._log_message(len(self.steps) - 1)):
~/.pyenv/versions/3.6.8/envs/absa-py36/lib/python3.6/site-packages/imblearn/ in _fit(self, X, y, **fit_params)
233 cloned_transformer = clone(transformer)
234 # Fit or load from cache the current transfomer
--> 235 if hasattr(cloned_transformer, "transform") or hasattr(
236 cloned_transformer, "fit_transform"
237 ):
UnboundLocalError: local variable 'cloned_transformer' referenced before assignment
I really have no idea what could be causing this. It has been happening ever since I used imbalanced learn's pipeline object.
Here is a list of packages and their version:
@zoj613 just in case, can you please upgrade your joblib
version? If I am not wrong it seems fairly old.
@chkoar updating joblib to the latest version worked like a charm, thank you. It was being set to that version because of the version of skater being used which required version 0.11. After downgrading skater to 1.0.4 I was able to upgrade joblib to 0.14.1
In [1]: from sklearn.datasets import make_classification
In [2]: from imblearn.pipeline import Pipeline
...: from imblearn.over_sampling import RandomOverSampler
...: from sklearn.preprocessing import StandardScaler
...: from sklearn.linear_model import SGDClassifier
...: X, y = make_classification()
...: steps = [
...: ('scaler', StandardScaler()),
...: ('sampler', RandomOverSampler()),
...: ('clf', SGDClassifier())
...: ]
...: p = Pipeline(steps, memory='./data/')
...:, y)
StandardScaler(copy=True, with_mean=True, with_std=True)),
SGDClassifier(alpha=0.0001, average=False, class_weight=None,
early_stopping=False, epsilon=0.1, eta0=0.0,
fit_intercept=True, l1_ratio=0.15,
learning_rate='optimal', loss='hinge',
max_iter=1000, n_iter_no_change=5, n_jobs=None,
penalty='l2', power_t=0.5, random_state=None,
shuffle=True, tol=0.001, validation_fraction=0.1,
verbose=0, warm_start=False))],
In [3]:
Oh yes, this should be the bug
Calling fit method of Pipeline object throws an expection:
UnboundLocalError: local variable 'cloned_transformer' referenced before assignment
, when thememory
argument is passed an argument. Therfore I am unable to cache any transformers (especially during hyperparameter tuning using a Pipeline object.Steps/Code to Reproduce
Expected Results
For this run successfully
Actual Results
Linux-4.15.0-1058-aws-x86_64-with-debian-buster-sid Python 3.6.8 (default, Nov 18 2019, 13:36:54) [GCC 6.5.0 20181026] NumPy 1.18.1 SciPy 1.4.1 Scikit-Learn 0.22.1 Imbalanced-Learn 0.6.1