isaac-sim / IsaacGymEnvs

Isaac Gym Reinforcement Learning Environments
Other
1.77k stars 389 forks source link

RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch) #204

Closed jstmn closed 4 months ago

jstmn commented 4 months ago

Hi,

I'm getting the following error when running python train.py task=Cartpole:

Traceback (most recent call last):
  File "train.py", line 214, in launch_rlg_hydra
    'sigma': cfg.sigma if cfg.sigma != '' else None
  File "/home/jstm/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 133, in run
    self.run_train(args)
  File "/home/jstm/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/torch_runner.py", line 116, in run_train
    agent.train()
  File "/home/jstm/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 1318, in train
    step_time, play_time, update_time, sum_time, a_losses, c_losses, b_losses, entropies, kls, last_lr, lr_mul = self.train_epoch()
  File "/home/jstm/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 1182, in train_epoch
    batch_dict = self.play_steps()
  File "/home/jstm/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 752, in play_steps
    self.obs, rewards, self.dones, infos = self.env_step(res_dict['actions'])
  File "/home/jstm/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/common/a2c_common.py", line 519, in env_step
    obs, rewards, dones, infos = self.vec_env.step(actions)
  File "/home/jstm/Libraries/isaacgym/IsaacGymEnvs/isaacgymenvs/utils/rlgames_utils.py", line 247, in step
    return self.env.step(actions)
  File "/home/jstm/Libraries/isaacgym/IsaacGymEnvs/isaacgymenvs/tasks/base/vec_task.py", line 389, in step
    self.post_physics_step()
  File "/home/jstm/Libraries/isaacgym/IsaacGymEnvs/isaacgymenvs/tasks/cartpole.py", line 173, in post_physics_step
    self.compute_reward()
  File "/home/jstm/Libraries/isaacgym/IsaacGymEnvs/isaacgymenvs/tasks/cartpole.py", line 128, in compute_reward
    self.reset_dist, self.reset_buf, self.progress_buf, self.max_episode_length
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

My system information:

gpu: RTX 4090
Driver Version: 525.116.04   
CUDA Version: 12.0
torch                   1.8.1
torchvision          0.9.1
Python 3.7.12
environment: conda, unmodified from `./create_conda_env_rlgpu.sh`

I've been looking online about how to resolve this cuda / torch / python version mismatch i'm not making much progress.

Any idea on how to resolve this?

Thanks

lhy0807 commented 4 months ago

I'm facing the same issue. Is there anyway to solve this?

jstmn commented 4 months ago

I bumped my torch version down to 1.13.1, ymmv

lhy0807 commented 4 months ago

Thanks. I also solve this by upgrading to a higher torch version.