mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3.05k stars 407 forks source link

Adding another hyperparameter setting for models #513

Closed ooghry closed 2 years ago

ooghry commented 2 years ago

Hello, I run Optuna for XGBoost, Catboost, and LightGBM separately and get hyperparameters, and I also have the JSON file that mljar creates for the same models. How can I run mljar again with all six hyperparameters?

pplonski commented 2 years ago

Hi @ooghry,

Yes you can (you mean with three sets of hyperparameters?). You should join them into one JSON. If you paste them in this issue I can help you. After you created the one JSON file with parameters you need to pass it to AutoML constructor:

optuna_init = json.loads(open('joined_params_from_previous_training.json').read())

automl = AutoML(
    mode='Optuna',
    optuna_init_params=optuna_init
)
automl.fit(X, y)

Please let me know if it works for you.

ooghry commented 2 years ago

I want to use hyperparameters like this:(I don't know it's beneficial or not)

{
    "original_LightGBM": {values},
    "original_Xgboost": {values},
    "original_CatBoost": {values},
    "custom_LightGBM": {values},
    "custom_Xgboost": {values},
    "custom_CatBoost": {values}
}

This is mljar JSON file content:

{
    "original_LightGBM": {
        "learning_rate": 0.1,
        "num_leaves": 848,
        "lambda_l1": 0.3424711290818368,
        "lambda_l2": 0.7352459290455047,
        "feature_fraction": 0.9460038268782655,
        "bagging_fraction": 0.5073597983245327,
        "bagging_freq": 6,
        "min_data_in_leaf": 98,
        "extra_trees": false,
        "metric": "multi_logloss",
        "custom_eval_metric_name": null,
        "num_boost_round": 1000,
        "early_stopping_rounds": 50,
        "cat_feature": [],
        "feature_pre_filter": false,
        "seed": 1234
    },
    "original_Xgboost": {
        "eta": 0.1,
        "max_depth": 8,
        "lambda": 4.7917648191346e-06,
        "alpha": 4.880244400077018,
        "colsample_bytree": 0.36955545794024963,
        "subsample": 0.9998681868468945,
        "min_child_weight": 94,
        "objective": "multi:softprob",
        "eval_metric": "mlogloss",
        "max_rounds": 1000,
        "early_stopping_rounds": 50,
        "seed": 1234
    },
    "original_CatBoost": {
        "learning_rate": 0.1,
        "depth": 4,
        "l2_leaf_reg": 5.457877926652389,
        "random_strength": 2.6233274115429235,
        "rsm": 0.36816993523851294,
        "min_data_in_leaf": 83,
        "eval_metric": "MultiClass",
        "num_boost_round": 1000,
        "early_stopping_rounds": 50,
        "seed": 1234
    }
}

My Catboost params:

{
    "objective": "MultiClassOneVsAll",
    "depth": 5,
    "boosting_type": "Plain",
    "bootstrap_type": "Bayesian",
    "classes_count":3,        
    "learning_rate": 0.09238255723788956,
    "l2_leaf_reg": 17.862345076360473,
    "min_data_in_leaf": 13,
    "one_hot_max_size": 3, 
    'od_wait': 27,
    "random_strength": 3.76480495124388,
    "n_estimators": 363,
    'bagging_temperature': 7.995126853948798,
}

My LightGBM params:

{
    "num_class": 3,
    'bagging_fraction': 0.5946813709083774,
    'bagging_freq': 2,
    'feature_fraction': 0.7713920460305471,
    'lambda_l1': 2.6014640621969187,
    'lambda_l2': 0.025132095337979454,
    'learning_rate': 0.00010805660156197516,
    'min_child_samples': 78,
    'num_leaves': 122,
    "tree_learner":"serial",
    "objective": 'multiclassova',
    "boosting": "gbdt",
    "metric": "multi_logloss",
}
pplonski commented 2 years ago

@ooghry few comments:

ooghry commented 2 years ago

Thank you @pplonski for your time and clarification.