unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.91k stars 857 forks source link

Cannot get optuna gridSearch to work. TypeError: Unknown type of parameter:series, got:TimeSeries #2121

Closed Allena101 closed 6 months ago

Allena101 commented 9 months ago

OBS reposting since i got triage label and as i understand it , that means my post/issue is not viewable. I apologize if i misunderstood what triage meant/did πŸ™

I am following your guides on optuna and Ray tune. With ray tune i keep getting time out error and dont know why , but i will start asking about optuna. I want to use lightgbm (as i understand it , any model in darts should be able useble). Will ask about optuna now since i did manage to get it to work some time ago with tensorflow.

I am testing such a simple model as possible just to see if it works and then i can make it more complex. Which seems like a good just get a constant torrent of errors either way.

The error i am getting is: TypeError: Unknown type of parameter:series, got:TimeSeries

Here is the code (again following your guide)

ts = TimeSeries.from_dataframe(edfs, 'dt', ['Interval_Sum'])

ts = ts.drop_before(pd.Timestamp("2023-08-30"))

ts_train, ts_val = ts.split_after(pd.Timestamp("2023-10-01"))

def objective(trial):

  max_depth = trial.suggest_categorical("max_depth", [2, 3])
  num_leaves = trial.suggest_categorical("num_leaves", [2, 3])
  lags = trial.suggest_categorical("lags", [3])

  pruner = PyTorchLightningPruningCallback(trial, monitor="val_loss")
  early_stopper = EarlyStopping("val_loss", min_delta=0.001, patience=3, verbose=True)
  callbacks = [pruner, early_stopper]

  pl_trainer_kwargs = {
      "accelerator": "auto",
      "callbacks": callbacks,
  }

  torch.manual_seed(42)

  # build the TCN model
  model = LightGBMModel(
      series=ts_train,
      # metric = rmse,
      forecast_horizon = 3,
      max_depth = max_depth,
      num_leaves = num_leaves,
      lags = lags

  )

  # train the model
  model.fit(
      series=ts_train,
      val_series=ts_val,
      # num_loader_workers=num_workers,
  )

  # reload best model over course of training
  model = TCNModel.load_from_checkpoint("tcn_model")

  # Evaluate how good it is on the validation set, using sMAPE
  preds = model.predict(series=train, n=ts_val)
  smapes = smape(ts_val, preds, n_jobs=-1, verbose=True)

# for convenience, print some optimization trials information
def print_callback(study, trial):
    print(f"Current value: {trial.value}, Current params: {trial.params}")
    print(f"Best value: {study.best_value}, Best params: {study.best_trial.params}")

# optimize hyperparameters by minimizing the sMAPE on the validation set
if __name__ == "__main__":
    study = optuna.create_study(direction="minimize")
    study.optimize(objective, n_trials=100, callbacks=[print_callback])

Here is more of the error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[~\AppData\Local\Temp\ipykernel_19460\55056152.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Magnus/Desktop/code/timeSeries/~/AppData/Local/Temp/ipykernel_19460/55056152.py) in <cell line: 64>()
     64 if __name__ == "__main__":
     65     study = optuna.create_study(direction="minimize")
---> 66     study.optimize(objective, n_trials=100, callbacks=[print_callback])

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\optuna\study\study.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/optuna/study/study.py) in optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
    441         """
    442 
--> 443         _optimize(
    444             study=self,
    445             func=func,

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\optuna\study\_optimize.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/optuna/study/_optimize.py) in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
     64     try:
     65         if n_jobs == 1:
---> 66             _optimize_sequential(
     67                 study,
     68                 func,

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\optuna\study\_optimize.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/optuna/study/_optimize.py) in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
    161 
    162         try:
--> 163             frozen_trial = _run_trial(study, func, catch)
    164         finally:
    165             # The following line mitigates memory problems that can be occurred in some

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\optuna\study\_optimize.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/optuna/study/_optimize.py) in _run_trial(study, func, catch)
    249         and not isinstance(func_err, catch)
    250     ):
--> 251         raise func_err
    252     return frozen_trial
    253 

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\optuna\study\_optimize.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/optuna/study/_optimize.py) in _run_trial(study, func, catch)
    198     with get_heartbeat_thread(trial._trial_id, study._storage):
    199         try:
--> 200             value_or_values = func(trial)
    201         except exceptions.TrialPruned as e:
    202             # TODO(mamu): Handle multi-objective cases.

[~\AppData\Local\Temp\ipykernel_19460\55056152.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Magnus/Desktop/code/timeSeries/~/AppData/Local/Temp/ipykernel_19460/55056152.py) in objective(trial)
     38 
     39     # train the model
---> 40     model.fit(
     41         series=ts_train,
     42         val_series=ts_val,

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\darts\models\forecasting\lgbm.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/darts/models/forecasting/lgbm.py) in fit(self, series, past_covariates, future_covariates, val_series, val_past_covariates, val_future_covariates, max_samples_per_ts, **kwargs)
    265             return self
    266 
--> 267         super().fit(
    268             series=series,
    269             past_covariates=past_covariates,

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\darts\models\forecasting\regression_model.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/darts/models/forecasting/regression_model.py) in fit(self, series, past_covariates, future_covariates, max_samples_per_ts, n_jobs_multioutput_wrapper, **kwargs)
   1615             future_covariates=future_covariates,
   1616         )
-> 1617         super().fit(
   1618             series=series,
   1619             past_covariates=past_covariates,

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\darts\models\forecasting\regression_model.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/darts/models/forecasting/regression_model.py) in fit(self, series, past_covariates, future_covariates, max_samples_per_ts, n_jobs_multioutput_wrapper, **kwargs)
    720             raise_log(ValueError("\n".join(component_lags_error_msg)), logger)
    721 
--> 722         self._fit_model(
    723             series, past_covariates, future_covariates, max_samples_per_ts, **kwargs
    724         )

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\darts\models\forecasting\regression_model.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/darts/models/forecasting/regression_model.py) in _fit_model(self, target_series, past_covariates, future_covariates, max_samples_per_ts, **kwargs)
   1793             cat_col_indices if cat_col_indices else cat_param_default
   1794         )
-> 1795         super()._fit_model(
   1796             target_series=target_series,
   1797             past_covariates=past_covariates,

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\darts\models\forecasting\regression_model.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/darts/models/forecasting/regression_model.py) in _fit_model(self, target_series, past_covariates, future_covariates, max_samples_per_ts, **kwargs)
    542         if len(training_labels.shape) == 2 and training_labels.shape[1] == 1:
    543             training_labels = training_labels.ravel()
--> 544         self.model.fit(training_samples, training_labels, **kwargs)
    545 
    546         # generate and store the lagged components names (for feature importance analysis)

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\lightgbm\sklearn.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/lightgbm/sklearn.py) in fit(self, X, y, sample_weight, init_score, eval_set, eval_names, eval_sample_weight, eval_init_score, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    893             callbacks=None, init_model=None):
    894         """Docstring is inherited from the LGBMModel."""
--> 895         super().fit(X, y, sample_weight=sample_weight, init_score=init_score,
    896                     eval_set=eval_set, eval_names=eval_names, eval_sample_weight=eval_sample_weight,
    897                     eval_init_score=eval_init_score, eval_metric=eval_metric,

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\lightgbm\sklearn.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/lightgbm/sklearn.py) in fit(self, X, y, sample_weight, init_score, group, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_group, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    746         callbacks.append(record_evaluation(evals_result))
    747 
--> 748         self._Booster = train(
    749             params=params,
    750             train_set=train_set,

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\lightgbm\engine.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/lightgbm/engine.py) in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
    269     # construct booster
    270     try:
--> 271         booster = Booster(params=params, train_set=train_set)
    272         if is_valid_contain_train:
    273             booster.set_train_data_name(train_data_name)

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\lightgbm\basic.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/lightgbm/basic.py) in __init__(self, params, train_set, model_file, model_str, silent)
   2603                 )
   2604             # construct booster object
-> 2605             train_set.construct()
   2606             # copy the parameters from train_set
   2607             params.update(train_set.get_params())

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\lightgbm\basic.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/lightgbm/basic.py) in construct(self)
   1813             else:
   1814                 # create train
-> 1815                 self._lazy_init(self.data, label=self.label,
   1816                                 weight=self.weight, group=self.group,
   1817                                 init_score=self.init_score, predictor=self._predictor,

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\lightgbm\basic.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/lightgbm/basic.py) in _lazy_init(self, data, label, reference, weight, group, init_score, predictor, silent, feature_name, categorical_feature, params)
   1515                 params['categorical_column'] = sorted(categorical_indices)
   1516 
-> 1517         params_str = param_dict_to_str(params)
   1518         self.params = params
   1519         # process for reference dataset

[c:\Users\Magnus\Desktop\code\timeSeries\venvTS\lib\site-packages\lightgbm\basic.py](file:///C:/Users/Magnus/Desktop/code/timeSeries/venvTS/lib/site-packages/lightgbm/basic.py) in param_dict_to_str(data)
    292             pairs.append(f"{key}={val}")
    293         elif val is not None:
--> 294             raise TypeError(f'Unknown type of parameter:{key}, got:{type(val).__name__}')
    295     return ' '.join(pairs)
    296 

TypeError: Unknown type of parameter:series, got:TimeSeries
madtoinou commented 9 months ago

Hi @Allena101,

I think you misunderstood what triage meant, it's just a way for us to classify issues and be able to filter them. triage just mean that we have yet to assign a category to the issue.

Your code deviated from the examples in a lot of places, hence causing the error.

Also, it's generally better to have a simple functional "training" script that you then wrap in optuna to look for the best hyper-parameter rather that start from a complex example and remove element (go from simple to complex rather than complex to simple).

import optuna

from darts.utils.timeseries_generation import linear_timeseries
from darts.models import LightGBMModel
from darts.metrics import smape

# create a dummy series
ts = linear_timeseries(length=100)
ts_train, ts_val = ts.split_after(0.8)

def objective(trial):
  max_depth = trial.suggest_categorical("max_depth", [2, 3])
  num_leaves = trial.suggest_categorical("num_leaves", [2, 3])
  lags = trial.suggest_categorical("lags", [3])

  # model constructor does not have the `series` argument
  model = LightGBMModel(
      forecast_horizon = 3,
      max_depth = max_depth,
      num_leaves = num_leaves,
      lags = lags
  )

  # train the model
  model.fit(
      series=ts_train,
      val_series=ts_val,
      # num_loader_workers=num_workers,
  )

  # LightGBModel cannot be loaded from checkpoint 

  # Evaluate how good it is on the validation set, using sMAPE
  # `train` was not defined, the name of the variable is `ts_train`
  # `n` should be an integer, not a series
  preds = model.predict(series=ts_train, n=len(ts_val))
  smapes = smape(ts_val, preds, n_jobs=-1, verbose=True)
  # you need to return the metric you want to optimize
  return smapes

study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=3)
Allena101 commented 9 months ago

Hi @Allena101,

I think you misunderstood what triage meant, it's just a way for us to classify issues and be able to filter them. triage just mean that we have yet to assign a category to the issue.

Your code deviated from the examples in a lot of places, hence causing the error.

Also, it's generally better to have a simple functional "training" script that you then wrap in optuna to look for the best hyper-parameter rather that start from a complex example and remove element (go from simple to complex rather than complex to simple).

import optuna

from darts.utils.timeseries_generation import linear_timeseries
from darts.models import LightGBMModel
from darts.metrics import smape

# create a dummy series
ts = linear_timeseries(length=100)
ts_train, ts_val = ts.split_after(0.8)

def objective(trial):
  max_depth = trial.suggest_categorical("max_depth", [2, 3])
  num_leaves = trial.suggest_categorical("num_leaves", [2, 3])
  lags = trial.suggest_categorical("lags", [3])

  # model constructor does not have the `series` argument
  model = LightGBMModel(
      forecast_horizon = 3,
      max_depth = max_depth,
      num_leaves = num_leaves,
      lags = lags
  )

  # train the model
  model.fit(
      series=ts_train,
      val_series=ts_val,
      # num_loader_workers=num_workers,
  )

  # LightGBModel cannot be loaded from checkpoint 

  # Evaluate how good it is on the validation set, using sMAPE
  # `train` was not defined, the name of the variable is `ts_train`
  # `n` should be an integer, not a series
  preds = model.predict(series=ts_train, n=len(ts_val))
  smapes = smape(ts_val, preds, n_jobs=-1, verbose=True)
  # you need to return the metric you want to optimize
  return smapes

study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=3)

Hi, Madtoinou πŸ‘‹ Thanks for taking the time to read my issue.

I get what you mean by taking a bottom up approach rather than a top down approach in this instance. However, i did manage to get it to work , when i tried using nbeats instead of lightgbm (so i have to try again with lightGBM later).

One thing that is important that i dont understand from you code example is the save and load model. you have save model checkpoint = True, and then you load the model after each trial and the comment says:

# reload best model over course of training
model = TCNModel.load_from_checkpoint("tcn_model")

Does this mean that save checkpoint in darts is equal to save_best_only=True in tensorflow?

Also , your example is meant to be ran in once session, correct? since the study object is not saved only the best model as i understand it. Which means that if you train more then the study risk repeating parameter combinations that it has already tried.

madtoinou commented 9 months ago

NBeats is a deep learning model whereas LightGBM is a regression model, hence the difference is available method/approach, especially for the saving/loading of models.

save_checkpoints in Darts save the latest by default. If a validation series is provided, the best is also saved. And this is what happens in the example notebook.

Since gridsearch is brute force all the combinations, there is no point running it again because you already covered the combinations once and the results is not supposed to change.

Furthermore, the principal take away from such hyper-parameters optimisation isin general the parameters themself, because ideally, you should then retrain the model on both the training and validation set.

Allena101 commented 9 months ago

NBeats is a deep learning model whereas LightGBM is a regression model, hence the difference is available method/approach, especially for the saving/loading of models.

save_checkpoints in Darts save the latest by default. If a validation series is provided, the best is also saved. And this is what happens in the example notebook.

Since gridsearch is brute force all the combinations, there is no point running it again because you already covered the combinations once and the results is not supposed to change.

Furthermore, the principal take away from such hyper-parameters optimisation isin general the parameters themself, because ideally, you should then retrain the model on both the training and validation set.

doesnt trial.suggest_int randomize from a range of values? In your example trial.suggest_int("num_filters", 1, 5), that would mean that for each trial a value is pcked between 1 and 5 (dont remember now if its inclusive or not). Then it would not be an exhaustive study as i understand it.

regarding save_checkpoint. doe that mean that if you are trainin a darts model normally (without using optuna) and you use a validation set, that if you then load the model it will always be the best scoring epoch and not the latest (i.e. the weights of the best scoring epoch and not the wights from the last epoch)?

madtoinou commented 9 months ago

Sorry for the confusion: the code snippet you use, based on optuna, does not perform a gridsearch but use more complex algorithm to sample the parameters (documentation) and it's indeed not exhaustive. gridsearch (as mentioned in the title), by definition, is exhaustive.

You can load either the last or the bext checkpoint, depending on the value of the parameter best of the load_checkpoint() method.

Allena101 commented 8 months ago

Sorry for the confusion: the code snippet you use, based on optuna, does not perform a gridsearch but use more complex algorithm to sample the parameters (documentation) and it's indeed not exhaustive. gridsearch (as mentioned in the title), by definition, is exhaustive.

You can load either the last or the bext checkpoint, depending on the value of the parameter best of the load_checkpoint() method.

Thank you madtoinou, i really appreciate the clarification!

If you set save_checkpoints parameter to True , does that mean that every epoch is saved?

Being able to rerun a study at a later time is quite important for me since i have not manage to get gpu training to work (even though my gpu is cuda compatible). I know most about TensorFlow and with that library you have to use cuda. I saw in yoru docs that you are using Pytorch Lightning Trainer, does that mean that you dont have to install cuda?

madtoinou commented 7 months ago

Hi @Allena101,

Sorry for the delay, got busy with other things.

By default, the trainer generates a checkpoints at the end of each epoch but only keep the last one to limit the number of files (and having hundreds of useless checkpoints). If a validation series is provided, the best (so far) checkpoint is also kept.

SImilarly to tensorflow, Pytorch Lightning also requires cuda in order to be able to use GPU acceleration during training, if passing pl_trainer_kwargs={"accelerator":"gpu"} does not work for you, I would recommend checking their repo/documentation to solve the gpu detection problem.

Allena101 commented 6 months ago

Hi @Allena101,

Sorry for the delay, got busy with other things.

By default, the trainer generates a checkpoints at the end of each epoch but only keep the last one to limit the number of files (and having hundreds of useless checkpoints). If a validation series is provided, the best (so far) checkpoint is also kept.

SImilarly to tensorflow, Pytorch Lightning also requires cuda in order to be able to use GPU acceleration during training, if passing pl_trainer_kwargs={"accelerator":"gpu"} does not work for you, I would recommend checking their repo/documentation to solve the gpu detection problem.

Hey, I totally understand not having much time to respond. I totally still appreciate you reading my issues and giving me valuable feedback!

Regarding darts built in gridsearch class method. I cant seem to figure out how to use it correctly. I run the code below. And i should only take a few seconds since there are only 2 combinations ( 'kernel_size': [2,3]), but for some reason the grid search runs for hundreds of epochs.

ts_brief = ts_iSum.drop_before(pd.Timestamp("2024-01-01"))

parameters = { 'kernel_size': [2,3], 'num_filters': [3], 'input_chunk_length': [5], 'output_chunk_length': [4], 'n_epochs': [1], }

TCN_brief = TCNModel.gridsearch( parameters=parameters, series = ts_brief['iSum4'], forecast_horizon = 7, metric = rmse, )

TCN_brief

madtoinou commented 6 months ago

Hi @Allena101,

The parameter n_epochs is correctly taken into account, however, gridsearch() calls historical_forecasts() under the hood when "expanding model" is used (see detailed documentation), which train the model iteratively using an expanding window to identify the set of parameters optimizing the metric over "historic forecast horizons" (see comment)

If you want to train the model only once and use a simpler train/valid split approach, you just need to provide the val_series argument.