Closed gsamaras closed 2 years ago
We released darts version 0.17.0 yesterday. Our TorchForecastingModel are now built on top of PyTorch Lightning.
Passing torch_device_str
should have raised a DeprecationWarning.
The device should now be set through pl_trainer_kwargs
(a dict of PyTorch Lightning Trainer parameters, see here) at model creation.
Can try it instead with below and let us know if it works? Also for further infromation about setting the device, see:
model_nbeats = NBEATSModel(
input_chunk_length=2,
output_chunk_length=1,
generic_architecture=True,
num_stacks=2,
num_blocks=1,
num_layers=1,
layer_widths=2,
n_epochs=20,
nr_epochs_val_period=1,
batch_size=2,
random_state=0,
optimizer_cls=optim.Adam,
optimizer_kwargs={"lr": 1e-3},
lr_scheduler_cls=optim.lr_scheduler.ReduceLROnPlateau,
lr_scheduler_kwarg={"optimizer": optim.Adam, "threshold": 0.0001, "verbose": True},
# torch_device_str="cuda:0",
pl_trainer_kwargs={
"accelerator": "gpu",
"gpus": [0]
}
)
@dennisbader thanks for the prompt reply. This did the trick in getting the GPU used, but I guess because of PyTorch Lightning something else also broke and I now get this error:
MisconfigurationException: `configure_optimizers` must include a monitor when a `ReduceLROnPlateau` scheduler is used. For example: {"optimizer": optimizer, "lr_scheduler": scheduler, "monitor": "metric_to_track"}
I tried passing "monitor": "val_loss"
in the optimizer's kwargs or in the lr scheduler's kwargs, but that didn't solve the issue. Any idea?
Hey @gsamaras and thanks for that.
This is indeed a bug and happens when using ReduceLROnPlateau -> https://github.com/PyTorchLightning/pytorch-lightning/issues/4454
We will fix this soon. For now, you can either use the model without ReduceLROnPlateau or downgrade darts to version 0.16.1.
No that's fine, I can do it.
Thanks again!
Darts 0.17.1 was released, which fixes both the torch_device_str
issue and the ReduceLROnPlateau
bug.
@dennisbader indeed I was able to have this working. I also checked that the documentation was updated, thanks!
May I ask if I'll be able to simply use a TPU like:
pl_trainer_kwargs={
"accelerator": "tpu",
"tpus": [0]
}
or it's something that darts won't seamlessly handle (like in the GPU case)? I don't know if TPUs can work with local data (which do not live in the Google cloud to be honest).
PS: As a side note: After upgrading darts to 0.17.1, historical_forecasts() take a significant amount of time (40 minutes for < 3.500 data points), while with darts 0.16.1 that would take just a few minutes. I'll investigate further though and open a new issue if needed.
@dennisbader indeed I was able to have this working. I also checked that the documentation was updated, thanks!
May I ask if I'll be able to simply use a TPU like:
pl_trainer_kwargs={ "accelerator": "tpu", "tpus": [0] }
I think that should work, but if you are on Colab, PyTorch lightning (which Darts relies on), requires taking an extra step to make TPUs work: https://pytorch-lightning.readthedocs.io/en/stable/advanced/tpu.html#colab-tpus
I'll close this issue for now as the GPU issue is solved. Don't hesitate to open a new one if you spot other issues.
Describe the bug Suddenly, upon relaunching my notebook I wasn't able to train N-Beats on GPU and got
ValueError: 'cuda' is not a valid DistributedType
, without me changing anything in the code.To Reproduce Install like this in a Jupyter Notebook:
!pip install 'u8darts[torch]
and then try to train any model in GPU, e.g. an N-Beats model like this:
which gives the error:
My instance has a GPU:
Expected behavior Training in GPU should be possible.
System (please complete the following information):
Could it be that something with the dependency on Torch is happening?
Additional context Related: https://github.com/unit8co/darts/issues/801