optuna / optuna-examples

Examples for https://github.com/optuna/optuna
MIT License
692 stars 178 forks source link

Add Feature Selection Example with Pruning and ETA Prediction #263

Closed elifarley closed 5 months ago

elifarley commented 6 months ago

Motivation

Feature selection is a critical step in machine learning to enhance model performance and reduce overfitting. However, evaluating every possible combination of features can be computationally expensive. This example introduces an efficient approach to feature selection using Optuna, which significantly reduces the search space and computation time.

Changes

This PR adds a new example class that demonstrates how to use Optuna for feature selection. The class conducts a study where each trial attempts a different subset of features from the input dataset. The key highlights of this example include:

Pruning Strategy

Trials are pruned using optuna.exceptions.TrialPruned in three scenarios:

  1. The number of features exceeds a user-defined maximum.
  2. No features are selected for the trial.
  3. The feature set has been previously evaluated in another trial.

ETA Prediction

After each successful trial, an estimated time of arrival (ETA) is printed, providing users with a prediction of when the study is likely to complete. This feature is particularly useful for long-running studies, allowing users to manage their time efficiently.

Benefits

Efficiency: By pruning trials that exceed the feature limit, have no selected features, or repeat previous trials, we save significant computation time.

User Experience: The ETA prediction enhances the user experience by setting expectations for study completion, enabling better planning and time management.

Example Usage

from feature_selection import feature_removal_cv

feature_removal_cv(
    model_params={
        "objective": "regression",
        "metric": "rmse",
        "data_random_seed": 42,
        "num_boost_round": 1000,
        "early_stopping_rounds": 10,
        "learning_rate": 0.12599281729053988,
        "force_row_wise": True,
        "verbose": -1,
        "verbose_eval": False,
        "num_leaves": 631,
        "max_depth": 7,
        "min_child_samples": 65,
        "colsample_bytree": 0.8430078242019065,
        "reg_alpha": 0.06636017620531826,
        "reg_lambda": 0.057077523364489346,
    },
    X=df.drop(columns=["MedHouseVal"]),
    y=df.MedHouseVal,
    split_count=5,
    trial_count=800,
)
not522 commented 6 months ago

Thanks for the PR. I checked the contents, but it seems to be a package that uses Optuna rather than an example. The optuna-examples repository is a collection of short snippets, so it is not a suitable place to publish your package. How about publishing it in a repository you own?

github-actions[bot] commented 5 months ago

This pull request has not seen any recent activity.

not522 commented 5 months ago

Let me close this PR. If you have any opinions, please feel free to reopen it.