rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.83k stars 855 forks source link

ExhaustiveFeatureSelector(print_progress=True) doesn't do anything #949

Open FlorinAndrei opened 2 years ago

FlorinAndrei commented 2 years ago

Describe the bug

Waiting through a long EFS run, hoping to get some progress feedback, I don't see any progress shown anywhere.

Steps/Code to Reproduce

from sklearn.linear_model import LinearRegression
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS
lr = LinearRegression()
efs_def = EFS(
    lr,
    scoring="neg_mean_squared_error",
    print_progress=True,
    min_features=1,
    max_features=n_features,
    n_jobs=os.cpu_count(),
)
efsl = efs_def.fit(X.values, y.values)

X is a Pandas dataframe with 19 columns and 90 rows. y is a single-column dataframe.

This runs in Jupyter Notebook on Windows 10.

Expected Results

Some messages somewhere?

Actual Results

Nothing is printed in the notebook.

Nothing is printed in the Command Prompt where I run jupyter notebook.

Versions

MLxtend 0.20.0 Windows-10-10.0.19044-SP0 Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] Scikit-learn 1.1.1 NumPy 1.22.4 SciPy 1.8.1

FlorinAndrei commented 2 years ago

Right after I wrote the bug report, it started to print messages in the notebook:

Features: 3000/524287IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

It printed a bunch of those, and then the incrementing Features counter at the bottom. It finished in 15 minutes.

I think the printing of these messages could be improved. Maybe rate limit them? Maybe switch to TQDM instead? And the long initial silence is confusing.

rasbt commented 2 years ago

Thanks for posting. This sounds annoying indeed and needs to be improved. Maybe a refresh_rate parameter could help here, or just moving to tqdm like you suggested. It's commonly used in other projects as well, and I wouldn't mind introducing it as a dependency