sktime / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.96k stars 629 forks source link

element 0 of tensors does not require grad and does not have a grad_fn #187

Closed mahdiebm99ipm closed 3 years ago

mahdiebm99ipm commented 3 years ago

Hi, I'm trying to follow the tutorial with my own data. When I run the learning rate finder, i got this error:

here is the full traceback:

RuntimeError                              Traceback (most recent call last)
<ipython-input-26-a92b5627800b> in <module>
      1 # find optimal learning rate
----> 2 res = trainer.tuner.lr_find(
      3     tft,
      4     train_dataloader=train_dataloader,
      5     val_dataloaders=val_dataloader,

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\tuner\tuning.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold, datamodule)
    118             datamodule: Optional[LightningDataModule] = None
    119     ):
--> 120         return lr_find(
    121             self.trainer,
    122             model,

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\tuner\lr_finder.py in lr_find(trainer, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold, datamodule)
    167 
    168     # Fit, lr & loss logged in callback
--> 169     trainer.fit(model,
    170                 train_dataloader=train_dataloader,
    171                 val_dataloaders=val_dataloaders,

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\trainer\trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    444         self.call_hook('on_fit_start')
    445 
--> 446         results = self.accelerator_backend.train()
    447         self.accelerator_backend.teardown()
    448 

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\accelerators\cpu_accelerator.py in train(self)
     57 
     58         # train or test
---> 59         results = self.train_or_test()
     60         return results
     61 

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\accelerators\accelerator.py in train_or_test(self)
     64             results = self.trainer.run_test()
     65         else:
---> 66             results = self.trainer.train()
     67         return results
     68 

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\trainer\trainer.py in train(self)
    493 
    494                 # run train epoch
--> 495                 self.train_loop.run_training_epoch()
    496 
    497                 if self.max_steps and self.max_steps <= self.global_step:

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\trainer\training_loop.py in run_training_epoch(self)
    559             # TRAINING_STEP + TRAINING_STEP_END
    560             # ------------------------------------
--> 561             batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
    562 
    563             # when returning -1 from train_step, we end epoch early

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\trainer\training_loop.py in run_training_batch(self, batch, batch_idx, dataloader_idx)
    726 
    727                         # optimizer step
--> 728                         self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
    729 
    730                     else:

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\trainer\training_loop.py in optimizer_step(self, optimizer, opt_idx, batch_idx, train_step_and_backward_closure, *args, **kwargs)
    467         with self.trainer.profiler.profile("optimizer_step"):
    468             # optimizer step lightningModule hook
--> 469             self.trainer.accelerator_backend.optimizer_step(
    470                 optimizer, batch_idx, opt_idx, train_step_and_backward_closure, *args, **kwargs
    471             )

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\accelerators\accelerator.py in optimizer_step(self, optimizer, batch_idx, opt_idx, lambda_closure, *args, **kwargs)
    112 
    113         # model hook
--> 114         model_ref.optimizer_step(
    115             epoch=self.trainer.current_epoch,
    116             batch_idx=batch_idx,

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\core\lightning.py in optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx, optimizer_closure, on_tpu, using_native_amp, using_lbfgs, *args, **kwargs)
   1378             optimizer.step(*args, **kwargs)
   1379         else:
-> 1380             optimizer.step(closure=optimizer_closure, *args, **kwargs)
   1381 
   1382     def optimizer_zero_grad(

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\torch\optim\lr_scheduler.py in wrapper(*args, **kwargs)
     65                 instance._step_count += 1
     66                 wrapped = func.__get__(instance, cls)
---> 67                 return wrapped(*args, **kwargs)
     68 
     69             # Note that the returned function here is no longer a bound method,

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_forecasting\optim.py in step(self, closure)
    129             closure: A closure that reevaluates the model and returns the loss.
    130         """
--> 131         _ = closure()
    132         loss = None
    133         # note - below is commented out b/c I have other work that passes back

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\trainer\training_loop.py in train_step_and_backward_closure()
    716 
    717                         def train_step_and_backward_closure():
--> 718                             result = self.training_step_and_backward(
    719                                 split_batch,
    720                                 batch_idx,

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\trainer\training_loop.py in training_step_and_backward(self, split_batch, batch_idx, opt_idx, optimizer, hiddens)
    821             # backward pass
    822             with self.trainer.profiler.profile("model_backward"):
--> 823                 self.backward(result, optimizer, opt_idx)
    824 
    825             # hook - call this hook only

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\trainer\training_loop.py in backward(self, result, optimizer, opt_idx, *args, **kwargs)
    841             self.trainer.accelerator_backend.backward(result, optimizer, opt_idx, *args, **kwargs)
    842         else:
--> 843             result.closure_loss = self.trainer.accelerator_backend.backward(
    844                 result.closure_loss, optimizer, opt_idx, *args, **kwargs
    845             )

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\accelerators\accelerator.py in backward(self, closure_loss, optimizer, opt_idx, *args, **kwargs)
     93             # do backward pass
     94             model = self.trainer.get_model()
---> 95             model.backward(closure_loss, optimizer, opt_idx, *args, **kwargs)
     96 
     97             # once backward has been applied, release graph

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\pytorch_lightning\core\lightning.py in backward(self, loss, optimizer, optimizer_idx, *args, **kwargs)
   1256         """
   1257         if self.trainer.train_loop.automatic_optimization or self._running_manual_backward:
-> 1258             loss.backward(*args, **kwargs)
   1259 
   1260     def toggle_optimizer(self, optimizer: Optimizer, optimizer_idx: int):

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph)
    219                 retain_graph=retain_graph,
    220                 create_graph=create_graph)
--> 221         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    222 
    223     def register_hook(self, hook):

c:\users\u2\appdata\local\programs\python\python38\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    128         retain_graph = create_graph
    129 
--> 130     Variable._execution_engine.run_backward(
    131         tensors, grad_tensors_, retain_graph, create_graph,
    132         allow_unreachable=True)  # allow_unreachable flag

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
jdb78 commented 3 years ago

Interesting. @diditforlulz273 had a similar problem. As I cannot reproduce the issue, could you provide more details of the tensor that lacks the grad_fn? An example in a colab notebook would be worth gold.

diditforlulz273 commented 3 years ago

Indeed. But this error suddenly disappeared since I upgraded pytorch-forecasting to 0.7.0, so I can't provide any example, unfortunately.

mahdiebm99ipm commented 3 years ago

@jdb78 thanks for your response, I will provide complete example on Anonymized dataset and share it. the problem occurs when number of unique time series are large. I have tested the code on small set of data and it worked.

b-kaindl commented 3 years ago

Not sure if it might help narrow down the issue ore if I'm just piggybacking off of an actual issue here, but I came across the same error after trying to apply the code from your TFT tutorial to OWID's COVID19 dataset in an attempt to build a model for COVID19 forecasting (see end of the notebook)

ganzinotti commented 3 years ago

I also encountered this error. No solutions yet.

jdb78 commented 3 years ago

A reproducible example in a colab notebook would help a lot to solve the issue.

b-kaindl commented 3 years ago

@jdb78 would this help? It's the first time I use colab, so feel free to lmk if you need anything else.

jdb78 commented 3 years ago

The specific error is caused by missings in the data. allow_missings in the TimeSeriesDataSet refers to missing observations in the sense of that a complete row is not in the dataset. Categorical missings can be treated by using the NaNLabelEncoder(add_nan=True) but continuous variables need to be free of nulls (particularly the target). I will clarify the documentation and add some asserts in the code.

b-kaindl commented 3 years ago

The specific error is caused by missings in the data. allow_missings in the TimeSeriesDataSet refers to missing observations in the sense of that a complete row is not in the dataset. Categorical missings can be treated by using the NaNLabelEncoder(add_nan=True) but continuous variables need to be free of nulls (particularly the target). I will clarify the documentation and add some asserts in the code.

thanks a lot for the clarification @jdb78 ! curious to hear whether the others had the same root cause.

BeHappyForMe commented 3 years ago

The specific error is caused by missings in the data. allow_missings in the TimeSeriesDataSet refers to missing observations in the sense of that a complete row is not in the dataset. Categorical missings can be treated by using the NaNLabelEncoder(add_nan=True) but continuous variables need to be free of nulls (particularly the target). I will clarify the documentation and add some asserts in the code.

thanks a lot for the clarification @jdb78 ! curious to hear whether the others had the same root cause.

yes,i get the error by the same cause. Thanks a lot