RuntimeError: CUDA error: an illegal memory access was encountered

BL-CX commented 3 weeks ago

Importing module 'gym_38' (/home/blcx/Downloads/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so) Setting GYM_USD_PLUG_INFO_PATH to /home/blcx/Downloads/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json PyTorch version 1.10.0+cu113 Device count 1 /home/blcx/Downloads/isaacgym/python/isaacgym/_bindings/src/gymtorch Using /home/blcx/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... Emitting ninja build file /home/blcx/.cache/torch_extensions/py38_cu113/gymtorch/build.ninja... Building extension module gymtorch... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module gymtorch... Setting seed: 1 Not connected to PVD +++ Using GPU PhysX Physics Engine: PhysX Physics Device: cuda:0 GPU Pipeline: enabled /home/blcx/anaconda3/envs/robotics_env/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, *kwargs) # type: ignore[attr-defined] 'train_cfg' provided -> Ignoring 'name=anymal_c_flat' Actor MLP: Sequential( (0): Linear(in_features=48, out_features=128, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=128, out_features=64, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=64, out_features=32, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=32, out_features=12, bias=True) ) Critic MLP: Sequential( (0): Linear(in_features=48, out_features=128, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=128, out_features=64, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=64, out_features=32, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=32, out_features=1, bias=True) ) /home/blcx/anaconda3/envs/robotics_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1102: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (Triggered internally at ../aten/src/ATen/native/cudnn/RNN.cpp:925.) return forward_call(input, **kwargs) PxgCudaDeviceMemoryAllocator fail to allocate memory 339738624 bytes!! Result = 2 /buildAgent/work/99bede84aa0a52c2/source/gpunarrowphase/src/PxgNarrowphaseCore.cpp (11310) : internal error : GPU compressContactStage1 fail to launch kernel stage 1!!

/buildAgent/work/99bede84aa0a52c2/source/gpunarrowphase/src/PxgNarrowphaseCore.cpp (11347) : internal error : GPU compressContactStage2 fail to launch kernel stage 1!!

[Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 4202 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 4210 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3480 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3535 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 6137 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysXCuda.cu: 991 Traceback (most recent call last): File "legged_gym/scripts/play.py", line 121, in play(args) File "legged_gym/scripts/play.py", line 58, in play ppo_runner, train_cfg = task_registry.make_alg_runner(env=env, name=args.task, args=args, train_cfg=train_cfg) File "/home/blcx/Downloads/legged_gym/legged_gym/utils/task_registry.py", line 147, in make_alg_runner runner = OnPolicyRunner(env, train_cfg_dict, log_dir, device=args.rl_device) File "/home/blcx/Downloads/rsl_rl-v1.0.2/rsl_rl/runners/on_policyrunner.py", line 81, in init , _ = self.env.reset() File "/home/blcx/Downloads/legged_gym/legged_gym/envs/base/base_task.py", line 114, in reset obs, privilegedobs, , , = self.step(torch.zeros(self.num_envs, self.num_actions, device=self.device, requires_grad=False)) File "/home/blcx/Downloads/legged_gym/legged_gym/envs/base/legged_robot.py", line 90, in step self.torques = self._compute_torques(self.actions).view(self.torques.shape) File "/home/blcx/Downloads/legged_gym/legged_gym/envs/anymal_c/anymal.py", line 75, in _compute_torques self.sea_input[:, 0, 0] = (actions * self.cfg.control.action_scale + self.default_dof_pos - self.dof_pos).flatten() RuntimeError: CUDA error: an illegal memory access was encountered

Black0Moonlight commented 2 weeks ago

I got this problem before, my solution is reducing the number of envs. I don't know the real reason, but instead of 8192, using 4096 can make it works without cuda error.

BL-CX commented 2 weeks ago

OK! Thank you very much!I also solved this problem by replacing the graphics card with a better one

leggedrobotics / legged_gym

RuntimeError: CUDA error: an illegal memory access was encountered #77