transic-robot / transic

MIT License
58 stars 2 forks source link

[Question] How to solve 'RuntimeError: normal expects all elements of std >= 0.0'? #3

Open zzzzcl opened 4 months ago

zzzzcl commented 4 months ago

Hello, I was training the 'InsertFull' task following the 'Training Teacher Policies' command: python3 main/rl/train.py task=InsertFull num_envs=128 sim_device=cuda:0 rl_device=cuda:0 graphics_device_id=0

My system info: Ubuntu 18.04, 4060Ti (GPU Driver: 535.146) Cuda 11.6 / Python 3.8 / Torch 1.13.1 Isaac Sim Version: 4.0.0

But I got this error in the training phase: RuntimeError: normal expects all elements of std >= 0.0


saving next best rewards:  [2.83]
=> saving checkpoint 'runs/InsertFull_06-12-21-03-24/nn/InsertFull.pth'
fps step: 4206 fps step and policy inference: 3990 fps total: 3879 epoch: 2058/9999999999999 frames: 8425472
fps step: 4346 fps step and policy inference: 4109 fps total: 3992 epoch: 2059/9999999999999 frames: 8429568
Error executing job with overrides: ['task=InsertFull', 'num_envs=128', 'sim_device=cuda:0', 'rl_device=cuda:0', 'graphics_device_id=0']
Traceback (most recent call last):
  File "main/rl/train.py", line 218, in launch_rlg_hydra
    runner.run(
  File "/home/orange/anaconda3/envs/transic/lib/python3.8/site-packages/rl_games/torch_runner.py", line 133, in run
    self.run_train(args)
  File "/home/orange/202406/transic-main/transic/rl/runner.py", line 35, in run_train
    agent.train()
  File "/home/orange/202406/transic-main/transic/rl/base.py", line 1263, in train
    ) = self.train_epoch()
  File "/home/orange/202406/transic-main/transic/rl/base.py", line 1081, in train_epoch
    batch_dict = self.play_steps()
  File "/home/orange/202406/transic-main/transic/rl/base.py", line 864, in play_steps
    res_dict = self.get_action_values(self.obs)
  File "/home/orange/202406/transic-main/transic/rl/base.py", line 488, in get_action_values
    res_dict = self.model(input_dict)
  File "/home/orange/anaconda3/envs/transic/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/orange/202406/transic-main/transic/rl/models.py", line 183, in forward
    selected_action = distr.sample()
  File "/home/orange/anaconda3/envs/transic/lib/python3.8/site-packages/torch/distributions/normal.py", line 70, in sample
    return torch.normal(self.loc.expand(shape), self.scale.expand(shape))
yunfanjiang commented 4 months ago

This might be caused by inaccurate collision simulation which results in large observation values. Consider replacing task-irrelevant furniture parts with primitive geometries, fixing the base links, or simply removing them.