Open BL-CX opened 3 weeks ago
I got this problem before, my solution is reducing the number of envs. I don't know the real reason, but instead of 8192, using 4096 can make it works without cuda error.
OK! Thank you very much!I also solved this problem by replacing the graphics card with a better one
Importing module 'gym_38' (/home/blcx/Downloads/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so) Setting GYM_USD_PLUG_INFO_PATH to /home/blcx/Downloads/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json PyTorch version 1.10.0+cu113 Device count 1 /home/blcx/Downloads/isaacgym/python/isaacgym/_bindings/src/gymtorch Using /home/blcx/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... Emitting ninja build file /home/blcx/.cache/torch_extensions/py38_cu113/gymtorch/build.ninja... Building extension module gymtorch... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module gymtorch... Setting seed: 1 Not connected to PVD +++ Using GPU PhysX Physics Engine: PhysX Physics Device: cuda:0 GPU Pipeline: enabled /home/blcx/anaconda3/envs/robotics_env/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, *kwargs) # type: ignore[attr-defined] 'train_cfg' provided -> Ignoring 'name=anymal_c_flat' Actor MLP: Sequential( (0): Linear(in_features=48, out_features=128, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=128, out_features=64, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=64, out_features=32, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=32, out_features=12, bias=True) ) Critic MLP: Sequential( (0): Linear(in_features=48, out_features=128, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=128, out_features=64, bias=True) (3): ELU(alpha=1.0) (4): Linear(in_features=64, out_features=32, bias=True) (5): ELU(alpha=1.0) (6): Linear(in_features=32, out_features=1, bias=True) ) /home/blcx/anaconda3/envs/robotics_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1102: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (Triggered internally at ../aten/src/ATen/native/cudnn/RNN.cpp:925.) return forward_call(input, **kwargs) PxgCudaDeviceMemoryAllocator fail to allocate memory 339738624 bytes!! Result = 2 /buildAgent/work/99bede84aa0a52c2/source/gpunarrowphase/src/PxgNarrowphaseCore.cpp (11310) : internal error : GPU compressContactStage1 fail to launch kernel stage 1!!
/buildAgent/work/99bede84aa0a52c2/source/gpunarrowphase/src/PxgNarrowphaseCore.cpp (11347) : internal error : GPU compressContactStage2 fail to launch kernel stage 1!!
[Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 4202 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 4210 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3480 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 3535 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysX.cpp: 6137 [Error] [carb.gym.plugin] Gym cuda error: an illegal memory access was encountered: ../../../source/plugins/carb/gym/impl/Gym/GymPhysXCuda.cu: 991 Traceback (most recent call last): File "legged_gym/scripts/play.py", line 121, in
play(args)
File "legged_gym/scripts/play.py", line 58, in play
ppo_runner, train_cfg = task_registry.make_alg_runner(env=env, name=args.task, args=args, train_cfg=train_cfg)
File "/home/blcx/Downloads/legged_gym/legged_gym/utils/task_registry.py", line 147, in make_alg_runner
runner = OnPolicyRunner(env, train_cfg_dict, log_dir, device=args.rl_device)
File "/home/blcx/Downloads/rsl_rl-v1.0.2/rsl_rl/runners/on_policyrunner.py", line 81, in init
, _ = self.env.reset()
File "/home/blcx/Downloads/legged_gym/legged_gym/envs/base/base_task.py", line 114, in reset
obs, privilegedobs, , , = self.step(torch.zeros(self.num_envs, self.num_actions, device=self.device, requires_grad=False))
File "/home/blcx/Downloads/legged_gym/legged_gym/envs/base/legged_robot.py", line 90, in step
self.torques = self._compute_torques(self.actions).view(self.torques.shape)
File "/home/blcx/Downloads/legged_gym/legged_gym/envs/anymal_c/anymal.py", line 75, in _compute_torques
self.sea_input[:, 0, 0] = (actions * self.cfg.control.action_scale + self.default_dof_pos - self.dof_pos).flatten()
RuntimeError: CUDA error: an illegal memory access was encountered