robodhruv / visualnav-transformer

Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.
http://general-navigation-models.github.io
MIT License
610 stars 78 forks source link

Error in train.py #36

Open zqs010908 opened 4 months ago

zqs010908 commented 4 months ago

Thank you for providing the code. I am trying to use train.by to train my model,but I encountered the following issue while using train.py

Traceback (most recent call last): File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/train.py", line 402, in main(config) File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/train.py", line 326, in main train_eval_loop_nomad( File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/vint_train/training/train_eval_loop.py", line 196, in train_eval_loop_nomad ema_model = EMAModel(model=model,power=0.75) TypeError: init() missing 1 required positional argument: 'parameters'

Traceback (most recent call last):
File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/train.py", line 402, in main(config) File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/train.py", line 326, in main train_eval_loop_nomad( File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/vint_train/training/train_eval_loop.py", line 203, in train_eval_loop_nomad train_nomad( File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/vint_train/training/train_utils.py", line 661, in train_nomad loss.backward() File "/home/iiau-vln/miniconda3/envs/nomad/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward torch.autograd.backward( File "/home/iiau-vln/miniconda3/envs/nomad/lib/python3.8/site-packages/torch/autograd/init.py", line 266, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

zolmaeng commented 4 months ago

Try using EMAModel(model.parameters(), power=0.75) instead of EMAModel(model, power=0.75).

robodhruv commented 4 months ago

Thanks @zolmaeng! Does that fix the problem, @zqs010908 ? We also welcome PRs with reproducible bug reports :)

This seems like an issue for other users too, tracking it at https://github.com/robodhruv/visualnav-transformer/issues/30

zqs010908 commented 4 months ago

I have already used EMAModel(model.parameters(), power=0.75),but a new problem has arisen

/home/iiau-vln/miniconda3/envs/nomad/lib/python3.8/site-packages/diffusers/training_utils.py:361: FutureWarning: Passing a torch.nn.Module to ExponentialMovingAverage.step is deprecated. Please pass the parameters of the module instead. deprecate( Traceback (most recent call last):
File "/home/iiau-vln/ws_zqs/nomad/ori_nomad/visualnav-transformer/train/train.py", line 402, in main(config) File "/home/iiau-vln/ws_zqs/nomad/ori_nomad/visualnav-transformer/train/train.py", line 326, in main train_eval_loop_nomad( File "/home/iiau-vln/ws_zqs/nomad/ori_nomad/visualnav-transformer/train/vint_train/training/train_eval_loop.py", line 203, in train_eval_loop_nomad train_nomad( File "/home/iiau-vln/ws_zqs/nomad/ori_nomad/visualnav-transformer/train/vint_train/training/train_utils.py", line 676, in train_nomad ema_model.averaged_model, AttributeError: 'EMAModel' object has no attribute 'averaged_model'

I found that the EMAModel class in diffusers.training_utils indeed does not have the averaged_model object. I see that the EMAModel in diffusion_policy has the averaged_model object, as referenced in line 31 of ema_model.py. I'm not sure if this method is correct.

zqs010908 commented 4 months ago

And after I used the EMAModel in diffusion_policy, I found that it solved the previous problem, but now a new issue has arisen. I am using the config from nomad and training with the SACSoN/HuRoN dataset. Have you encountered this issue before? Traceback (most recent call last):
File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/train.py", line 402, in main(config) File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/train.py", line 326, in main train_eval_loop_nomad( File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/vint_train/training/train_eval_loop.py", line 203, in train_eval_loop_nomad train_nomad( File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/vint_train/training/train_utils.py", line 860, in train_nomad loss.backward() File "/home/iiau-vln/miniconda3/envs/nomad/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward torch.autograd.backward( File "/home/iiau-vln/miniconda3/envs/nomad/lib/python3.8/site-packages/torch/autograd/init.py", line 266, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

keshav0306 commented 3 months ago

Change line 31 of the file ema_model.py in diffusion_policy/diffusion_policy/model/diffusion/ from self.averaged_model = model to self.averaged_model = copy.deepcopy(model)