rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.82k stars 853 forks source link

Fixed #971 turned off joblib when `n_jobs == 1` #985

Closed NimaSarajpoor closed 1 year ago

NimaSarajpoor commented 1 year ago

This PR fixes issue #971

Performance Code

seed = 0
X = np.random.rand(10000, 10) # 10k samples, with 10 features
y = np.random.choice([0, 1], size=10000)

lst = []
for i in range(5):
    tic = time.time()
    efs = EFS(RandomForestClassifier()).fit(X, y) # EFS: ExhaustiveFeatureSelector
    toc = time.time()
    lst.append(toc - tic)

np.mean(lst)

Computing Time

codecov[bot] commented 1 year ago

Codecov Report

Base: 77.43% // Head: 77.43% // No change to project coverage :thumbsup:

Coverage data is based on head (7599ebf) compared to base (423d217). Patch coverage: 100.00% of modified lines in pull request are covered.

:exclamation: Current head 7599ebf differs from pull request most recent head e912885. Consider uploading reports for the commit e912885 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #985 +/- ## ======================================= Coverage 77.43% 77.43% ======================================= Files 198 198 Lines 11165 11165 Branches 1406 1406 ======================================= Hits 8646 8646 Misses 2305 2305 Partials 214 214 ``` | [Impacted Files](https://codecov.io/gh/rasbt/mlxtend/pull/985?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka) | Coverage Δ | | |---|---|---| | [mlxtend/\_\_init\_\_.py](https://codecov.io/gh/rasbt/mlxtend/pull/985/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka#diff-bWx4dGVuZC9fX2luaXRfXy5weQ==) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Sebastian+Raschka)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

NimaSarajpoor commented 1 year ago

I will fix this in the upcoming days.

NimaSarajpoor commented 1 year ago

@rasbt I think it is ready. if there is something that I missed, please let me know.

rasbt commented 1 year ago

Was just testing the code and it definitely improved the startup time. When I am trying an example like

import numpy as np
from sklearn.linear_model import LogisticRegression
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS

seed = 0
X = np.random.rand(10000, 10) # 10k samples, with 10 features
y = np.random.choice([0, 1], size=10000)

model = LogisticRegression()

efs1 = EFS(model, 
           min_features=1,
           max_features=10,
           scoring='accuracy',
           print_progress=True,
           n_jobs=1,
           cv=5)

efs1 = efs1.fit(X, y)

print('Best accuracy score: %.2f' % efs1.best_score_)
print('Best subset (indices):', efs1.best_idx_)
print('Best subset (corresponding names):', efs1.best_feature_names_)

it still seems to be a bit stuck though. I.e., it would not show any output for like 2-3 min and then iterate through the 1k possibilities in like 1 sec.

I wonder if that's an issue with the verbose display functionality though 🤔

EDIT: No worries, it was a computer issue. It works perfectly now. Actually it solves the problem. Before, a user could not see the progress printed to the command line until all combinations were evaluated. Now, you get the feedback immediately if n_jobs==1

NimaSarajpoor commented 1 year ago

EDIT: No worries, it was a computer issue. It works perfectly now. Actually it solves the problem. Before, a user could not see the progress printed to the command line until all combinations were evaluated. Now, you get the feedback immediately if n_jobs==1

Thanks for the info :)