rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.82k stars 853 forks source link

Access intermediate feature subset select #1043

Closed arilwan closed 1 year ago

arilwan commented 1 year ago

Owning to the fact that Sequential Feature Selection is really a time-consuming preprocessing task.

Wouldn't it be nice to have some way to access immediate features selected while the algorithm keeps running. So for example using SFFS with say 100 features to select the best, would be nice at round N, to somehow retrieve feature subset selected at end of the selection round.

rasbt commented 1 year ago

Thanks for the suggestions! Actually, the good news is that this is already possible via Example 11 here: https://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/#example-11-interrupting-long-runs-for-intermediate-results

But please feel free to reopen this in case it doesn't work or doesn't fully solve the problem.

arilwan commented 1 year ago

@rasbt

Very sorry to reopen this issue again, I understand from the example you mentioned, Intermidiate Results are accessible upon process Interruption.

What I hope do to is retrieve those attributes (no of features selected, & metric score) saved to a variable (or write to a file) after adding every feature in an SFFS, without interrupting.

For example, I started running the selection process below on 2 June 2023.

[2023-06-14 08:01:25] Features: 149/240 -- score: 0.8947770129386831[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed: 21.0min
[Parallel(n_jobs=-1)]: Done  91 out of  91 | elapsed: 60.4min finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed: 20.8min
[Parallel(n_jobs=-1)]: Done 149 out of 149 | elapsed: 100.3min finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed: 21.4min
[Parallel(n_jobs=-1)]: Done 148 out of 148 | elapsed: 89.4min finished

[2023-06-14 12:11:29] Features: 149/240 -- score: 0.8952890526770254[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed: 18.2min
[Parallel(n_jobs=-1)]: Done  91 out of  91 | elapsed: 53.0min finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed: 18.5min

For 2 weeks now, and still way to go, maybe another 2 weeks.

Suppose those attributes as accessible, and say saved to a file, I can do some anoalysis of the results after Features: 50/240, Features: 100/240, Features: 148/240 etc. without actually interrupting the running process.

Isn't there any way to write those to a file?

arilwan commented 1 year ago

@rasbt Can you please guide me what section of the code should I change to continuously write the attributes values to a txt file that I can keep updating after adding every feature?

rasbt commented 1 year ago

For future reference, linking the discussion here: #1051