mattharrison / effective_xgboost_book

262 stars 25 forks source link

Chapter 21 #use hyperopt to train the model and extend to log metrics in MLFlow while training. #4

Closed ben-mcdaniel-tfs closed 11 months ago

ben-mcdaniel-tfs commented 1 year ago

When running the block of code below: ex_id = mlflow.create_experiment(name='ex3', artifact_location='ex2path') mlflow.set_experiment(experiment_name='ex3') with mlflow.start_run(): params = {'random_state': 42} rounds = [{'max_depth': hp.quniform('max_depth', 1, 12, 1), # tree 'min_child_weight': hp.loguniform('min_child_weight', -2, 3)}, {'subsample': hp.uniform('subsample', 0.5, 1), # stochastic 'colsample_bytree': hp.uniform('colsample_bytree', 0.5, 1)}, {'gamma': hp.loguniform('gamma', -10, 10)}, # regularization {'learning_rate': hp.loguniform('learning_rate', -7, 0)} # boosting ]

for round in rounds:
    params = {**params, **round}
    trials = Trials()
    best = fmin(fn=lambda space: hyperparameter_tuning(
            space, X_train, y_train, X_test, y_test),            
        space=params,           
        algo=tpe.suggest,            
        max_evals=10,            
        trials=trials,
        timeout=60*5 # 5 minutes
    )
    params = {**params, **best}
    for param, val in params.items():
        mlflow.log_param(param, val)
    params['max_depth'] = int(params['max_depth'])
    xg = xgb.XGBClassifier(eval_metric='logloss', early_stopping_rounds=50, **params)
    xg.fit(X_train, y_train,
           eval_set=[(X_train, y_train),
                     (X_test, y_test)
                    ]
          )     
    for metric in [metrics.accuracy_score, metrics.precision_score, metrics.recall_score, 
                   metrics.f1_score]:
        mlflow.log_metric(metric.__name__, metric(y_test, xg.predict(X_test)))
model_info = mlflow.xgboost.log_model(xg, artifact_path='model')

The following Exception is raised:

MlflowException: Changing param values is not allowed. Param with key='max_depth' was already logged with value='8.0' for run ID=''. Attempted logging new value '8'. The cause of this error is typically due to repeated calls to an individual run_id event logging.

Incorrect Example:

with mlflow.start_run(): mlflow.log_param("depth", 3) mlflow.log_param("depth", 5)

Which will throw an MlflowException for overwriting a logged parameter.

Correct Example:

with mlflow.start_run(): with mlflow.start_run(nested=True): mlflow.log_param("depth", 3) with mlflow.start_run(nested=True): mlflow.log_param("depth", 5)

Which will create a new nested run for each individual model and prevent parameter key collisions within the tracking store.

chigili commented 11 months ago

I had the same issue and solved it. Please refer to this issue for solution: https://github.com/mattharrison/effective_xgboost_book/issues/8

ben-mcdaniel-tfs commented 11 months ago

This did indeed resolve the issue I was having. Thank you.