scikit-learn-contrib / imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
https://imbalanced-learn.org
MIT License
6.85k stars 1.29k forks source link

Dead dataset used in documentation #1040

Closed eroell closed 1 year ago

eroell commented 1 year ago

Describe the bug

The documentation of CondensedNearestNeighbors uses a dead dataset.

Steps/Code to Reproduce

from collections import Counter  
from sklearn.datasets import fetch_mldata  
from imblearn.under_sampling import CondensedNearestNeighbour  
pima = fetch_mldata('diabetes_scale')  
X, y = pima['data'], pima['target']  
print('Original dataset shape %s' % Counter(y))  
cnn = CondensedNearestNeighbour(random_state=42)  
X_res, y_res = cnn.fit_resample(X, y)  
print('Resampled dataset shape %s' % Counter(y_res)) 

Expected Results

No error thrown. The outputs should be

Original dataset shape Counter({1: 500, -1: 268})  
Resampled dataset shape Counter({-1: 268, 1: 227})  

Actual Results

ImportError: cannot import name 'fetch_mldata' from 'sklearn.datasets' (/Users/eljas.roellin/Documents/imbalance/imbalance_venv/lib/python3.11/site-packages/sklearn/datasets/__init__.py)

Explanation

This dataset has been discontinued on sklearn.datasets, see here.

Suggested Solution

Any other small unbalanced dataset with 2 classes could be used for showcase instead I think.

Versions

System:
    python: 3.11.4 (main, Jul  5 2023, 08:40:20) [Clang 14.0.6 ]
executable: /Users/USER/Documents/imbalance/imbalance_venv/bin/python
   machine: macOS-13.5.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.3.0
          pip: 23.2.1
   setuptools: 65.5.0
        numpy: 1.25.2
        scipy: 1.11.2
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /Users/USER/Documents/imbalance/imbalance_venv/lib/python3.11/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/USER/Documents/imbalance/imbalance_venv/lib/python3.11/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/USER/Documents/imbalance/imbalance_venv/lib/python3.11/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: armv8