linkedin / greykite

A flexible, intuitive and fast forecasting library
BSD 2-Clause "Simplified" License
1.81k stars 106 forks source link

Question: How to use sample_weight in grid searching a model #127

Open msat59 opened 1 year ago

msat59 commented 1 year ago

Hi there.

I have seen the sample_weight parameter in the doc here, but have no idea how to use it.

The purpose of using sample_weight is I have a series with some missing values. I don't want to use backward/forward-filled data as it may change the model performance. I want to use sample_weight to ignore their effect in the grid search results.

I appreciate it if someone can advise how to use the sample weight in the model.

EDITED: I found regression_weight_col keyword in the codes, for instance in the SilverkiteEstimator code, but I couldn't find how to define and use it.

msat59 commented 1 year ago

Has this feature been implemented yet?

According to simple_silverkite_template.py, regression_weight_col should be defined in the ModelComponentsParam, in the custom dictionary:

custom={
                "feature_sets_enabled": self.constants.COMMON_MODELCOMPONENTPARAM_PARAMETERS["FEASET"][components[components.index("FEASET")+1]],
                "fit_algorithm_dict": self.constants.COMMON_MODELCOMPONENTPARAM_PARAMETERS["ALGO"][components[components.index("ALGO")+1]],
                "max_daily_seas_interaction_order": self.constants.COMMON_MODELCOMPONENTPARAM_PARAMETERS["DSI"][freq][components[components.index("DSI")+1]],
                "max_weekly_seas_interaction_order": self.constants.COMMON_MODELCOMPONENTPARAM_PARAMETERS["WSI"][freq][components[components.index("WSI")+1]],
                "extra_pred_cols": [],
                "drop_pred_cols": None,
                "explicit_pred_cols": None,
                "min_admissible_value": None,
                "max_admissible_value": None,
                "regression_weight_col": None,
                "normalize_method": "zero_to_one"
            },

However, it seems that it hasn't been implemented yet as I get this error when I add it there. Note that my dataframe has all columns: ['ts', 'y', 'sample_weight']. I debugged the data and the internally created df had only ts and y columns.

ValueError: 
All the 12 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
12 fits failed with the following error:
Traceback (most recent call last):
  File "C:\Users\user\miniconda3\envs\py38\lib\site-packages\pandas\core\indexes\base.py", line 3081, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'sample_weight'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\user\miniconda3\envs\py38\lib\site-packages\sklearn\model_selection\_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\user\miniconda3\envs\py38\lib\site-packages\sklearn\pipeline.py", line 405, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "C:\Users\user\miniconda3\envs\py38\lib\site-packages\greykite\sklearn\estimator\simple_silverkite_estimator.py", line 271, in fit
    self.model_dict = self.silverkite.forecast_simple(
  File "C:\Users\user\miniconda3\envs\py38\lib\site-packages\greykite\algo\forecast\silverkite\forecast_simple_silverkite.py", line 836, in forecast_simple
    trained_model = super().forecast(**parameters)
  File "C:\Users\user\miniconda3\envs\py38\lib\site-packages\greykite\algo\forecast\silverkite\forecast_silverkite.py", line 956, in forecast
    trained_model = fit_ml_model_with_evaluation(
  File "C:\Users\user\miniconda3\envs\py38\lib\site-packages\greykite\algo\common\ml_models.py", line 704, in fit_ml_model_with_evaluation
    trained_model = fit_ml_model(
  File "C:\Users\user\miniconda3\envs\py38\lib\site-packages\greykite\algo\common\ml_models.py", line 384, in fit_ml_model
    if df[regression_weight_col].min() < 0:
  File "C:\Users\user\miniconda3\envs\py38\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\user\miniconda3\envs\py38\lib\site-packages\pandas\core\indexes\base.py", line 3083, in get_loc
    raise KeyError(key) from err
KeyError: 'sample_weight'
msat59 commented 1 year ago

@al-bert , is this project dead?