Open AndrewJGroves opened 9 months ago
Hi @AndrewJGroves, a minimal reproducible example would indeed be nice to debug. Could you try to provide one?
Of course, this is a example. As soon as I change n_jobs to be more than 1 I get the error. Im not using a gpu system. Interestingly I get the same error if I run prophet this way but don't seem to see it with regression models, all the classes are based on the same principle with very little changes. My pytourch-lightning version is 2.1.2 and torch is 2.2.0
#!pip install darts "optuna<=3.4.0"
from darts import TimeSeries
from darts.metrics import rmse
from darts.models import BlockRNNModel
from darts.utils.model_selection import train_test_split
import optuna
import torch
from darts.datasets import WeatherDataset
from optuna.integration import PyTorchLightningPruningCallback
class BlockRNNModelOptimise(object):
def __init__(
self,
df,
all_past_rts,
forecast_horizon,
):
self.df = df
self.all_past_rts = all_past_rts
self.forecast_horizon = forecast_horizon
def __call__(self, trial):
if torch.cuda.is_available():
pl_trainer_kwargs = {
"accelerator":"gpu",
"devices":"auto",
"callbacks": [PyTorchLightningPruningCallback(trial, monitor="train_loss")],
"enable_progress_bar":True
}
num_workers = 4
else:
pl_trainer_kwargs = {
"accelerator": "cpu",
"devices": 1,
"callbacks": [PyTorchLightningPruningCallback(trial, monitor="train_loss")],
"enable_progress_bar":True
}
num_workers = 0
input_chunk_length = trial.suggest_int("input_chunk_length", 1, 5)
output_chunk_length = trial.suggest_int("output_chunk_length",1, 1)
model = trial.suggest_categorical("model", ["RNN", "LSTM", "GRU"])
hidden_dim = trial.suggest_int("hidden_dim", 25, 30)
n_rnn_layers = trial.suggest_int("n_rnn_layers", 1, 3)
lr = trial.suggest_float("lr", 0.005, 0.01)
train, test = train_test_split(
self.df,
test_size=0.20,
input_size=input_chunk_length,
horizon=self.forecast_horizon,
vertical_split_type="model-aware",
)
model = BlockRNNModel(
n_epochs=10,
random_state=1,
input_chunk_length=input_chunk_length,
output_chunk_length=output_chunk_length,
model=model,
hidden_dim=hidden_dim,
n_rnn_layers=n_rnn_layers,
optimizer_kwargs={"lr": lr},
pl_trainer_kwargs=pl_trainer_kwargs,
)
model.fit(
train,
val_series=test,
past_covariates=self.all_past_rts,
val_past_covariates=self.all_past_rts,
num_loader_workers=num_workers,
verbose=False,
)
backtesting = model.backtest(
self.df,
past_covariates=self.all_past_rts,
start=0.8,
forecast_horizon=self.forecast_horizon,
stride=4,
metric=rmse,
)
return backtesting
series = WeatherDataset().load()
# predicting atmospheric pressure
df = series['p (mbar)'][:100]
# optionally, use past observed rainfall (pretending to be unknown beyond index 100)
all_past_rts = series['rain (mm)'][:100]
forecast_horizon = 6
storage = optuna.storages.RDBStorage(
url="sqlite:///example.db",
engine_kwargs={"connect_args": {"check_same_thread": False}},
)
objective = BlockRNNModelOptimise(df,all_past_rts, forecast_horizon)
study = optuna.create_study(
direction="minimize",
storage=storage,
sampler=optuna.samplers.TPESampler(),
pruner=optuna.pruners.MedianPruner()
)
study.optimize(objective, n_trials=5,n_jobs=2) #when n_jobs = 1 it works
brnn_params = study.best_params
if os.path.exists('example.db'):
os.remove('example.db')
storage.remove_session()
pl_trainer_kwargs = {
"accelerator": "cpu",
"devices": 1,
"enable_progress_bar":True
}
num_workers = 0
model = BlockRNNModel(
optimizer_kwargs={"lr": brnn_params.pop("lr",None)},
**brnn_params,
n_epochs=100,
random_state=1,
pl_trainer_kwargs=pl_trainer_kwargs,
)
model.fit(
df,
past_covariates=all_past_rts,
num_loader_workers=0,
verbose=False,
)
backtesting = model.backtest(
df,
past_covariates=all_past_rts,
start=0.8,
forecast_horizon=forecast_horizon,
stride=4,
metric=rmse
)
print(backtesting)
Hi,
I investigated a bit and it appears that the line responsible for the bug is when n_jobs > 1
:
However, when the line is deleted, the model in some trials do not correspond to the desired/provided parameters (reading from the class attribute from another trial?!). Fixing this would probably require some refactoring of the ModelMeta
class.
Thanks for looking, sounds like its a hard fix. So your aware I dont get this error when running FourTheta, liner regression and lgboost but I get the same error for prophet as well as RNNModel, TransformerModel, NBEATSModel and NHiTSModel (These are the only models I have tried optuna with)
Hi all, Any news about this problem ? Having multiples jobs for RNN, Nbeast and Nhits models would be greatly appreciated :) Thank you
Hi @championbruno,
No progress has been made on this (it does not have a high priority since it's an optimization problem, and despite being slower, the code runs fine with n_jobs=1
). I'll change the labels to indicate that PR are welcome but I cannot give you a timeline.
Hi
Im trying to run deep learning optimisation using optuna. It works fine if I have n_trails=1 however if I increase that number to say 2 I get a error AttributeError: _model_call. I have enough cpus. The full error is shown below. If you want the code thats fine but will take be abit to seperate it all out so hoping for a easy fix.