[BUG] ReferenceError crash on add_encoders

adrien-gauche commented 1 year ago

Describe the bug the hyper optimization fails after several loops on the error "ReferenceError: weakly-referenced object no longer exists". The bug is not direct, the call of fit() is well done on the first loops.

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_23/2411547071.py", line 56, in objective
    log_tensorboard=True, #https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html
  File "/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/forecasting_model.py", line 99, in __call__
    return super().__call__(**all_params)
  File "/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/tcn_model.py", line 440, in __init__
    super().__init__(**self._extract_torch_model_params(**self.model_params))
  File "/opt/conda/lib/python3.7/site-packages/darts/utils/torch.py", line 112, in decorator
    return decorated(self, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/torch_forecasting_model.py", line 256, in __init__
    super().__init__(add_encoders=add_encoders)
  File "/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/forecasting_model.py", line 1665, in __init__
    super().__init__(add_encoders=add_encoders)
  File "/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/forecasting_model.py", line 124, in __init__
    self._model_params = self._extract_model_creation_params()
  File "/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/forecasting_model.py", line 1521, in _extract_model_creation_params
    model_params = copy.deepcopy(self._model_call)
  File "/opt/conda/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
[...]
  File "/opt/conda/lib/python3.7/copy.py", line 159, in deepcopy
    copier = getattr(x, "__deepcopy__", None)
ReferenceError: weakly-referenced object no longer exists
[W 2023-02-13 08:58:25,299] Trial 7 failed with value None.

To Reproduce

encoders = {
    'cyclic': {"past": ['hour', 'weekofyear']},
    "transformer": Scaler()}

Expected behavior must not crash after several loops, or must crash after the first loop

System (please complete the following information):

Python version: 3.7.12
darts version 0.23.1

dennisbader commented 1 year ago

Can you add a minimum working example to reproduce this issue? I.e. your To Reproduce code is not reproducible.

eWizardII commented 1 year ago

I have encountered a similar issue with the following:

def objective1(trial):
    callback = [PyTorchLightningPruningCallback(trial, monitor="val_loss")]

    # set input_chunk_length, between 5 and 14 days
    days_in = trial.suggest_int("input_chunk_length", 31, 90)
    out_len = 15

    # Other hyperparameters
    kernel_size = trial.suggest_int("hidden_size", 1, 64)
    num_filters = trial.suggest_int("num_attention_heads", 1, 10)
    dropout = trial.suggest_float("dropout", 0.0, 0.5)

    # build and train the TCN model with these hyper-parameters:
    model = TFTModel(
    input_chunk_length=days_in,
    output_chunk_length=out_len,
    hidden_size=kernel_size,
    lstm_layers=1,
    num_attention_heads=num_filters,
    dropout=dropout,
    batch_size=256,
    n_epochs=200,
    add_relative_index=True,
    pl_trainer_kwargs={
      "accelerator": "gpu",
      "devices": [0],
      "callbacks": [early_stopper]
    },
    add_encoders=None,
    likelihood= None, #QuantileRegression(
         #quantiles=quantiles
     #),  # QuantileRegression is set per default
    # loss_fn=MSELoss(),
    random_state=42,
)

    model.fit(series = train_transformed, val_series = val_transformed, verbose=True) # , max_samples_per_ts = 3000

    # Evaluate how good it is on the validation set
    preds = model.predict(series=train_transformed, n=1)
    smapes = mape(val_transformed, preds, n_jobs=-1, verbose=True)
    smape_val = np.mean(smapes)

    return smape_val if smape_val != np.nan else float("inf")

Calling the following causes the error.

def print_callback(study, trial):
    print(f"Current value: {trial.value}, Current params: {trial.params}")
    print(f"Best value: {study.best_value}, Best params: {study.best_trial.params}")

study = optuna.create_study(direction="minimize")

study.optimize(objective1, n_trials=4,show_progress_bar=True, callbacks=[print_callback])

eWizardII commented 1 year ago

This is the actual full error:

---------------------------------------------------------------------------

ReferenceError                            Traceback (most recent call last)

[<ipython-input-151-315340f24666>](https://localhost:8080/#) in <module>
      6 study = optuna.create_study(direction="minimize")
      7 
----> 8 study.optimize(objective1, n_trials=4,show_progress_bar=True, callbacks=[print_callback])

29 frames

[/usr/local/lib/python3.8/dist-packages/optuna/study/study.py](https://localhost:8080/#) in optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
    423         """
    424 
--> 425         _optimize(
    426             study=self,
    427             func=func,

[/usr/local/lib/python3.8/dist-packages/optuna/study/_optimize.py](https://localhost:8080/#) in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
     64     try:
     65         if n_jobs == 1:
---> 66             _optimize_sequential(
     67                 study,
     68                 func,

[/usr/local/lib/python3.8/dist-packages/optuna/study/_optimize.py](https://localhost:8080/#) in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
    161 
    162         try:
--> 163             frozen_trial = _run_trial(study, func, catch)
    164         finally:
    165             # The following line mitigates memory problems that can be occurred in some

[/usr/local/lib/python3.8/dist-packages/optuna/study/_optimize.py](https://localhost:8080/#) in _run_trial(study, func, catch)
    249         and not isinstance(func_err, catch)
    250     ):
--> 251         raise func_err
    252     return frozen_trial
    253 

[/usr/local/lib/python3.8/dist-packages/optuna/study/_optimize.py](https://localhost:8080/#) in _run_trial(study, func, catch)
    198     with get_heartbeat_thread(trial._trial_id, study._storage):
    199         try:
--> 200             value_or_values = func(trial)
    201         except exceptions.TrialPruned as e:
    202             # TODO(mamu): Handle multi-objective cases.

[<ipython-input-149-a0adcba9ab81>](https://localhost:8080/#) in objective1(trial)
     12 
     13     # build and train the TCN model with these hyper-parameters:
---> 14     model = TFTModel(
     15     input_chunk_length=days_in,
     16     output_chunk_length=out_len,

[/usr/local/lib/python3.8/dist-packages/darts/models/forecasting/forecasting_model.py](https://localhost:8080/#) in __call__(cls, *args, **kwargs)
     97 
     98         # 6) call model
---> 99         return super().__call__(**all_params)
    100 
    101 

[/usr/local/lib/python3.8/dist-packages/darts/models/forecasting/tft_model.py](https://localhost:8080/#) in __init__(self, input_chunk_length, output_chunk_length, hidden_size, lstm_layers, num_attention_heads, full_attention, feed_forward, dropout, hidden_continuous_size, categorical_embedding_sizes, add_relative_index, loss_fn, likelihood, norm_type, **kwargs)
    872             model_kwargs["likelihood"] = QuantileRegression()
    873 
--> 874         super().__init__(**self._extract_torch_model_params(**model_kwargs))
    875 
    876         # extract pytorch lightning module kwargs

[/usr/local/lib/python3.8/dist-packages/darts/utils/torch.py](https://localhost:8080/#) in decorator(self, *args, **kwargs)
    110         with fork_rng():
    111             manual_seed(self._random_instance.randint(0, high=MAX_TORCH_SEED_VALUE))
--> 112             return decorated(self, *args, **kwargs)
    113 
    114     return decorator

[/usr/local/lib/python3.8/dist-packages/darts/models/forecasting/torch_forecasting_model.py](https://localhost:8080/#) in __init__(self, batch_size, n_epochs, model_name, work_dir, log_tensorboard, nr_epochs_val_period, force_reset, save_checkpoints, add_encoders, random_state, pl_trainer_kwargs, show_warnings)
    254             your forecasting use case. Default: ``False``.
    255         """
--> 256         super().__init__(add_encoders=add_encoders)
    257         suppress_lightning_warnings(suppress_all=not show_warnings)
    258 

[/usr/local/lib/python3.8/dist-packages/darts/models/forecasting/forecasting_model.py](https://localhost:8080/#) in __init__(self, add_encoders)
   1663 
   1664     def __init__(self, add_encoders: Optional[dict] = None):
-> 1665         super().__init__(add_encoders=add_encoders)
   1666 
   1667     @abstractmethod

[/usr/local/lib/python3.8/dist-packages/darts/models/forecasting/forecasting_model.py](https://localhost:8080/#) in __init__(self, *args, **kwargs)
    122 
    123         # extract and store sub class model creation parameters
--> 124         self._model_params = self._extract_model_creation_params()
    125 
    126         if "add_encoders" not in kwargs:

[/usr/local/lib/python3.8/dist-packages/darts/models/forecasting/forecasting_model.py](https://localhost:8080/#) in _extract_model_creation_params(self)
   1519     def _extract_model_creation_params(self):
   1520         """extracts immutable model creation parameters from `ModelMeta` and deletes reference."""
-> 1521         model_params = copy.deepcopy(self._model_call)
   1522         del self.__class__._model_call
   1523         return model_params

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in deepcopy(x, memo, _nil)
    170                     y = x
    171                 else:
--> 172                     y = _reconstruct(x, memo, *rv)
    173 
    174     # If is its own copy, don't memoize.

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    294             for key, value in dictiter:
    295                 key = deepcopy(key, memo)
--> 296                 value = deepcopy(value, memo)
    297                 y[key] = value
    298         else:

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in _deepcopy_list(x, memo, deepcopy)
    203     append = y.append
    204     for a in x:
--> 205         append(deepcopy(a, memo))
    206     return y
    207 d[list] = _deepcopy_list

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in deepcopy(x, memo, _nil)
    170                     y = x
    171                 else:
--> 172                     y = _reconstruct(x, memo, *rv)
    173 
    174     # If is its own copy, don't memoize.

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    268     if state is not None:
    269         if deep:
--> 270             state = deepcopy(state, memo)
    271         if hasattr(y, '__setstate__'):
    272             y.__setstate__(state)

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in _deepcopy_method(x, memo)
    235 
    236 def _deepcopy_method(x, memo): # Copy instance methods
--> 237     return type(x)(x.__func__, deepcopy(x.__self__, memo))
    238 d[types.MethodType] = _deepcopy_method
    239 

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in deepcopy(x, memo, _nil)
    170                     y = x
    171                 else:
--> 172                     y = _reconstruct(x, memo, *rv)
    173 
    174     # If is its own copy, don't memoize.

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    268     if state is not None:
    269         if deep:
--> 270             state = deepcopy(state, memo)
    271         if hasattr(y, '__setstate__'):
    272             y.__setstate__(state)

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

[/usr/lib/python3.8/copy.py](https://localhost:8080/#) in deepcopy(x, memo, _nil)
    149             y = _deepcopy_atomic(x, memo)
    150         else:
--> 151             copier = getattr(x, "__deepcopy__", None)
    152             if copier is not None:
    153                 y = copier(memo)

ReferenceError: weakly-referenced object no longer exists

dennisbader commented 1 year ago

@eWizardII, you need to redefine early_stopper inside the objective function as it is shown in our optuna example

@adriengauche, I assume the same applies to your issue.

adrien-gauche commented 1 year ago

@dennisbader yes I used the hyperoptimization example code with PyTorchLightningPruningCallback() and EarlyStopping() functions in the objective() function. But I defined early_stopper in the objective function. The bug disappears when pruning is not passed in the callback. I think my issue has to do with #1556

dennisbader commented 1 year ago

That is strange, for me it worked with the pruning workaround proposed in the issue you refereneced.

You should be able to use it with it.

eWizardII commented 1 year ago

@dennisbader thanks, that looks like it fixed my issue

@adriengauche yea I had that same issue - till I added that function I put in my comment above; that might work till the issue is fixed in #1556

unit8co / darts

[BUG] ReferenceError crash on add_encoders #1565