scikit-learn-contrib / imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
https://imbalanced-learn.org
MIT License
6.85k stars 1.29k forks source link

[BUG] Getting TypeError when running SMOTE() oversampler #1042

Closed malmashhadani-88 closed 1 year ago

malmashhadani-88 commented 1 year ago

Describe the bug

When I run SMOTE() on a dataset, I get TypeError due to Numpy operator

Steps/Code to Reproduce

from imblearn.over_sampling import SMOTE
x_train_os, y_train_os = SMOTE().fit_resample(x_train, y_train)

Expected Results

No error is thrown

Actual Results

TypeError Traceback (most recent call last) Cell In[139], line 2 1 from imblearn.over_sampling import SMOTE ----> 2 x_train_os, y_train_os = SMOTE().fit_resample(x_train, y_train)

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/imblearn/base.py:208, in BaseSampler.fit_resample(self, X, y) 187 """Resample the dataset. 188 189 Parameters (...) 205 The corresponding label of X_resampled. 206 """ 207 self._validate_params() --> 208 return super().fit_resample(X, y)

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/imblearn/base.py:112, in SamplerMixin.fit_resample(self, X, y) 106 X, y, binarize_y = self._check_X_y(X, y) 108 self.samplingstrategy = check_sampling_strategy( 109 self.sampling_strategy, y, self._sampling_type 110 ) --> 112 output = self._fitresample(X, y) 114 y = ( 115 label_binarize(output[1], classes=np.unique(y)) if binarizey else output[1] 116 ) 118 X, y_ = arraystransformer.transform(output[0], y)

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/imblearn/over_sampling/_smote/base.py:365, in SMOTE._fit_resample(self, X, y) 363 self.nnk.fit(X_class) 364 nns = self.nnk.kneighbors(X_class, return_distance=False)[:, 1:] --> 365 X_new, y_new = self._make_samples( 366 X_class, y.dtype, class_sample, X_class, nns, n_samples, 1.0 367 ) 368 X_resampled.append(X_new) 369 y_resampled.append(y_new)

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/imblearn/over_sampling/_smote/base.py:119, in BaseSMOTE._make_samples(self, X, y_dtype, y_type, nn_data, nn_num, n_samples, step_size) 116 rows = np.floor_divide(samples_indices, nn_num.shape[1]) 117 cols = np.mod(samples_indices, nn_num.shape[1]) --> 119 X_new = self._generate_samples(X, nn_data, nn_num, rows, cols, steps) 120 y_new = np.full(n_samples, fill_value=y_type, dtype=y_dtype) 121 return X_new, y_new

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/imblearn/over_sampling/_smote/base.py:163, in BaseSMOTE._generate_samples(self, X, nn_data, nn_num, rows, cols, steps) 123 def _generate_samples(self, X, nn_data, nn_num, rows, cols, steps): 124 r"""Generate a synthetic sample. 125 126 The rule for the generation is: (...) 161 Synthetically generated samples. 162 """ --> 163 diffs = nn_data[nn_num[rows, cols]] - X[rows] 165 if sparse.issparse(X): 166 sparse_func = type(X).name

TypeError: numpy boolean subtract, the - operator, is not supported, use the bitwise_xor, the ^ operator, or the logical_xor function instead.

Versions

System: python: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] executable: /anaconda/envs/azureml_py310_sdkv2/bin/python machine: Linux-5.15.0-1038-azure-x86_64-with-glibc2.31

Python dependencies: sklearn: 1.3.0 pip: 23.2.1 setuptools: 68.1.2 numpy: 1.24.4 scipy: 1.10.1 Cython: 0.29.35 pandas: 2.1.0 matplotlib: 3.7.3 joblib: 1.2.0 threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info: user_api: openmp internal_api: openmp prefix: libgomp filepath: /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0 version: None num_threads: 4

   user_api: blas

internal_api: openblas prefix: libopenblas filepath: /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so version: 0.3.18 threading_layer: pthreads architecture: SkylakeX num_threads: 4