Open MasterXiong opened 1 week ago
Does vulkaninfo
run w/o error on your end?
Hi @xuanlinli17 , thanks for your help!
Yes vulkaninfo
runs normally on my end. But the value of some attributes is false
, which I'm not sure is normal or not.
What's the nvidia driver version? I'd recommend it to be 535+. It also needs to be newer than the cuda version.
Thanks! I upgraded my nvidia driver version to 545 but still got the same error.
And when calling vulkaninfo
, I got the following error message at the beginning: 'DISPLAY' environment variable not set... skipping surface info error: XDG_RUNTIME_DIR not set in the environment.
Not sure if this may cause the error I got?
And is there any requirement on the minimal version of cuda?
CUDA 11.8 for RT-1 and Octo to run properly on GPU. See readme for more details.
Thanks! But the current issue happens when just using random actions, so I think it should not be caused by CUDA? Do you have any other suggestions on what to check in addition to the nvidia driver version? Thanks!
idk since usually core dump will already occur before you step actions and as soon as you create an environment, if something is wrong.
Yeah that's quite weird. Is there anything that only happens in step
while not in reset
? I think this may provide some hints on what operation causes the core dump error. Thanks!
Hi, I encoutnered a similar bug, using a conda env on my machine (ubuntu 22.04 & nvidia 4090 GPU) The command: vulkaninfo | head -n 5 gives: WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Failed to CreateInstance in ICD 0. Skipping ICD. Vulkan Instance Version: 1.3.204
Then running the example.ipynb gives kernel crash error. I converted it into a python file and the env.step line yielded: Segmentation fault (core dumped)
Hi, I encoutnered a similar bug, using a conda env on my machine (ubuntu 22.04 & nvidia 4090 GPU) The command: vulkaninfo | head -n 5 gives: WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Failed to CreateInstance in ICD 0. Skipping ICD. Vulkan Instance Version: 1.3.204
Then running the example.ipynb gives kernel crash error. I converted it into a python file and the env.step line yielded: Segmentation fault (core dumped)
I tried to get system updates and re-installed the nvidia driver, but it's still not working.
These are typically setup issues related to e.g., Vulkan. If https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html#troubleshooting or apt install nvidia-driver-xxx(some new version)
doesn't solve the problem, then this is tricky and might be due to some specific setups that you're using...
Additionally try sth like https://github.com/haosulab/SAPIEN/issues/115#issuecomment-1434899965 ?
I tried both, and they did not work, still the same error. I will try something else.
I have found that the problem lies in the ik computation of mani_skill2_real2sim. obs, reward, done, truncated, info = env.step(action) leads to BaseEnv class line 548 self.step_action(action) then line 561 self.agent.set_action(action) then BaseAgent line 165 self.controller.set_action(action) then CombinedController line 269 controller.set_action(action[start:end]) then PDEEPosController line 114 self._target_qpos = self.compute_ik(self._target_pose) then line 61 result, success, error = self.pmodel.compute_inverse_kinematics( self.ee_link_idx, target_pose, initial_qpos=self.articulation.get_qpos(), active_qmask=self.qmask, max_iterations=max_iterations, ) this line yields the segmentation fault. The arguments values: self.ee_link_idx: 13 target_pose: Pose([1.38064, -0.348417, 1.1829], [0.178008, -0.693489, 0.567687, -0.406346]) initial: array([-0.26394573, -0.26394573, -0.26394573, -0.26394573, -0.26394573, -0.26394573, -0.26394573, -0.26394573, -0.26394573, -0.26394573, -0.26394573], dtype=float32) active_mask: array([ True, True, True, True, True, True, True, False, False, True, True]) max_iterations: 100
This is quite strange as if the setup isn't right, env.reset() will directly cause core dump, not at env.step() and pmodel.compute_inverse_kinematics(); idk what's happening
Hi,
Thanks for sharing this brilliant package for real-2-sim evaluation!
I'm trying to run SimplerEnv inside a docker on a linux server with GPU support. The environment can be successfully created and reset, but a
Segmentation fault (core dumped)
error shows up when callingenv.step()
. I have followed the troubleshooting instructions in README, but still can't solve this issue. Could you please help have a look at what may be the issues here? Thanks a lot for your help!Below is the docker file I use (modified from ManiSkill's dockerfile)
The docker installation seems to work fine. And I'm using the same test script as given in README.