uclamii / model_tuner

A library to tune the hyperparameters of common ML models. Supports calibration and custom pipelines.
Apache License 2.0
3 stars 0 forks source link

Fix pipeline steps initialization and estimator setup for XGBoost compatibility #40

Closed lshpaner closed 4 weeks ago

lshpaner commented 1 month ago

Background

This change was initiated due to an issue identified when running the XGBoost model in our pipeline. XGBoost does not require preprocessing steps like imputation or scaling, and as a result, it should be run without any pipeline steps in certain cases. However, a bug was discovered where the pipeline setup failed if no preprocessing steps were provided, leading to incorrect estimator initialization. To address this, a fix was implemented to handle scenarios where the pipeline might be empty, ensuring proper configuration of the estimator regardless of the preprocessing steps.

Description

This PR fixes the initialization of the pipeline_steps and the assignment of the estimator in scenarios where the pipeline is provided and when it is not.

If pipeline_steps are provided (self.pipeline == True), the new estimator is appended to the existing steps. If no pipeline steps are provided (self.pipeline == False), the estimator is directly initialized with the original estimator.

Changes:

Code Changes:

self.pipeline_steps = pipeline_steps
if self.pipeline:
    self.estimator = self.PipelineClass(
        self.pipeline_steps
        + [(self.estimator_name, copy.deepcopy(self.original_estimator))]
    )
else:
    self.estimator = self.PipelineClass(                
        [(self.estimator_name, copy.deepcopy(self.original_estimator))]
    ) 

Reasoning

panas89 commented 4 weeks ago

Cannot approve there is a bug at line 798 of mudel_tuner_utils.py, error generated when testing it with the file.

notebooks/xgb_early_test.py