scikit-learn-contrib / imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
https://imbalanced-learn.org
MIT License
6.85k stars 1.29k forks source link

[BUG] ImportError: cannot import name '_check_X' from 'imblearn.utils._validation' (/usr/local/lib/python3.10/dist-packages/imblearn/utils/_validation.py) #1069

Closed apavlo89 closed 7 months ago

apavlo89 commented 8 months ago

Trying to run ADASYN on google collab notebook (Python 3.10.12) and i get this error: ImportError: cannot import name '_check_X' from 'imblearn.utils._validation' (/usr/local/lib/python3.10/dist-packages/imblearn/utils/_validation.py)

Weird right? I've tried uninstalling and reinstalling all packages but can't make this error disappear. Any suggestions? This is the code I am running:


import pandas as pd
from imblearn.over_sampling import ADASYN

# Assuming X_train is your training features as a DataFrame and y_train as your labels

# Apply ADASYN
adasyn = ADASYN(random_state=42)
X_train_resampled, y_train_resampled = adasyn.fit_resample(X_train, y_train)

# Convert resampled data back to DataFrame to retain column names
X_train_resampled_df = pd.DataFrame(X_train_resampled, columns=X_train.columns)
y_train_resampled_df = pd.Series(y_train_resampled)

# Now, X_train_resampled_df and y_train_resampled_df are ready for use in training

Here is the full error log:



---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-43-4e29c49f6d1f>](https://localhost:8080/#) in <cell line: 2>()
      1 import pandas as pd
----> 2 from imblearn.over_sampling import ADASYN
      3 
      4 # Assuming X_train is your training features as a DataFrame and y_train as your labels
      5 

5 frames
[/usr/local/lib/python3.10/dist-packages/imblearn/__init__.py](https://localhost:8080/#) in <module>
     50     # process, as it may not be compiled yet
     51 else:
---> 52     from . import (
     53         combine,
     54         ensemble,

[/usr/local/lib/python3.10/dist-packages/imblearn/combine/__init__.py](https://localhost:8080/#) in <module>
      3 """
      4 
----> 5 from ._smote_enn import SMOTEENN
      6 from ._smote_tomek import SMOTETomek
      7 

[/usr/local/lib/python3.10/dist-packages/imblearn/combine/_smote_enn.py](https://localhost:8080/#) in <module>
     11 
     12 from ..base import BaseSampler
---> 13 from ..over_sampling import SMOTE
     14 from ..over_sampling.base import BaseOverSampler
     15 from ..under_sampling import EditedNearestNeighbours

[/usr/local/lib/python3.10/dist-packages/imblearn/over_sampling/__init__.py](https://localhost:8080/#) in <module>
      6 from ._adasyn import ADASYN
      7 from ._random_over_sampler import RandomOverSampler
----> 8 from ._smote import SMOTE, SMOTEN, SMOTENC, SVMSMOTE, BorderlineSMOTE, KMeansSMOTE
      9 
     10 __all__ = [

[/usr/local/lib/python3.10/dist-packages/imblearn/over_sampling/_smote/__init__.py](https://localhost:8080/#) in <module>
----> 1 from .base import SMOTE, SMOTEN, SMOTENC
      2 from .cluster import KMeansSMOTE
      3 from .filter import SVMSMOTE, BorderlineSMOTE
      4 
      5 __all__ = [

[/usr/local/lib/python3.10/dist-packages/imblearn/over_sampling/_smote/base.py](https://localhost:8080/#) in <module>
     31 from ...utils._docstring import _n_jobs_docstring, _random_state_docstring
     32 from ...utils._param_validation import HasMethods, Interval, StrOptions
---> 33 from ...utils._validation import _check_X
     34 from ...utils.fixes import _is_pandas_df, _mode
     35 from ..base import BaseOverSampler

ImportError: cannot import name '_check_X' from 'imblearn.utils._validation' (/usr/local/lib/python3.10/dist-packages/imblearn/utils/_validation.py)
``
jawadkazi commented 8 months ago

This is odd because imblearn is already installed on Google Colab as it is a related project to sklearn. When I run this code below it runs without error. Further, Google Colab has the newest version of imblearn installed (check this with import imblearn and then imblearn.version you should see version 0.10.1. Instead of uninstalling and reinstalling packages, consider creating a new notebook on Google Colab cloud notebook and see if the error persists. A similar error was found relating to the relation between sci-kit learn version causing issues with imblearn (see this issue), but it is unlikely that is the case here.

# code I ran from a new notebook on Google Colab that ran witrhout ImportError
import numpy as np
import pandas as pd
from imblearn.over_sampling import ADASYN

X_train = pd.DataFrame(np.random.rand(100, 3), columns=['f1', 'f2', 'f3'])
y_train = pd.Series(np.random.randint(2, size=100))

adasyn = ADASYN(random_state=42)
X_train_resampled, y_train_resampled = adasyn.fit_resample(X_train, y_train)

X_train_resampled_df = pd.DataFrame(X_train_resampled, columns=X_train.columns)
y_train_resampled_df = pd.Series(y_train_resampled)
glemaitre commented 7 months ago

I also cannot reproduce even after updating imbalanced-learn using pip.

stevenfernando commented 5 months ago

Hello @glemaitre could you solve this issue? I have the same problem and i haven't found any solution