Bug: Seems that gpu simulation does not work when control_mode="pd_ee_delta_pose"

Zx55 commented 3 months ago

Reproduction:

The code is from here. I can run gpu sim when control_mode = 'pd_joint_delta_pos'. But if I change it to 'pd_ee_delta_pose', it fails.

import gymnasium as gym
import mani_skill.envs

def main():
    env = gym.make(
        "PickCube-v1",
        obs_mode="state",
        control_mode="pd_ee_delta_pose",
        num_envs=16,
    )
    print(env.observation_space) # will now have shape (16, ...)
    print(env.action_space) # will now have shape (16, ...)

    obs, _ = env.reset(seed=0) # reset with a seed for determinism
    for i in range(200):
        action = env.action_space.sample() # this is batched now
        obs, reward, terminated, truncated, info = env.step(action)
        done = terminated | truncated
        print(f"Obs shape: {obs.shape}, Reward shape {reward.shape}, Done shape {done.shape}")
        # note at the moment we do not support showing all parallel sub-scenes 
        # at once on a GUI, only during observation generation/video recording
    env.close()

if __name__ == '__main__':
    main()

Error Log:

Traceback (most recent call last):
  File "/home/PJLAB/chenzeren/桌面/ICMLW-ManiSkill-RH20TP-DigitalTwins/main_gpu.py", line 24, in <module>
    main()
  File "/home/PJLAB/chenzeren/桌面/ICMLW-ManiSkill-RH20TP-DigitalTwins/main_gpu.py", line 17, in main
    obs, reward, terminated, truncated, info = env.step(action)
  File "/home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/gymnasium/wrappers/time_limit.py", line 57, in step
    observation, reward, terminated, truncated, info = self.env.step(action)
  File "/home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/gymnasium/wrappers/order_enforcing.py", line 56, in step
    return self.env.step(action)
  File "/home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/mani_skill/envs/sapien_env.py", line 777, in step
    action = self._step_action(action)
  File "/home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/mani_skill/envs/sapien_env.py", line 837, in _step_action
    self.agent.set_action(action)
  File "/home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/mani_skill/agents/base_agent.py", line 255, in set_action
    self.controller.set_action(action)
  File "/home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/mani_skill/agents/controllers/base_controller.py", line 299, in set_action
    controller.set_action(action[:, start:end])
  File "/home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/mani_skill/agents/controllers/pd_ee_pose.py", line 190, in set_action
    self._target_qpos = self.compute_ik(self._target_pose, action)
  File "/home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/mani_skill/agents/controllers/pd_ee_pose.py", line 299, in compute_ik
    return super().compute_ik(
  File "/home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/mani_skill/agents/controllers/pd_ee_pose.py", line 138, in compute_ik
    self.fast_kinematics_model.jacobian_mixed_frame_pytorch(
RuntimeError: The deleter and context arguments are mutually exclusive.
Exception raised from make_tensor at /opt/conda/conda-bld/pytorch_1702400366987/work/build/aten/src/ATen/Functions.cpp:15 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f7ddd21a617 in /home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xd4d732 (0x7f7e2b23b732 in /home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x2f313 (0x7f7d8adcc313 in /home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/fast_kinematics.cpython-310-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x249df (0x7f7d8adc19df in /home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/fast_kinematics.cpython-310-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x37b7f (0x7f7d8add4b7f in /home/PJLAB/chenzeren/.conda/envs/manskill/lib/python3.10/site-packages/fast_kinematics.cpython-310-x86_64-linux-gnu.so)
frame #5: python() [0x4fdc87]
<omitting python frames>
frame #7: python() [0x509cbf]
frame #9: python() [0x5099ce]
frame #23: python() [0x5099ce]
frame #25: python() [0x5099ce]
frame #29: python() [0x5950f2]
frame #31: python() [0x5c5e67]
frame #32: python() [0x5c0fb0]
frame #33: python() [0x45970e]
frame #38: __libc_start_main + 0xf3 (0x7f7e9abaf083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #39: python() [0x58852e]

ManiSkill version: 3.0.0b5 Sapien version: 3.0.0b1

StoneT2000 commented 3 months ago

Can you try upgrading pytorch and let me know what version you are using?

StoneT2000 commented 3 months ago

Issue is fixed. However given some torch versions do not work at the moment on the GPU sim with fast_kinematics I will likely add back pytorch_kinematics as a dependency to support older torch versions and simply output a warning about it.

HeegerGao commented 2 months ago

Hi, may I know how you fix this issue because I am still facing it. Installing pytorch-kinematics cannot solve this issue on my end. I am using a customized robot and task with PDEEPoseController. I am using torch==2.0.1 with cuda 11.7. Is it possible for me to use GPU simulation (which requires fast_kinematics)?

Zx55 commented 2 months ago

Hi, I update torch to 2.3.0 to solve it

HeegerGao commented 2 months ago

Thanks!

haosulab / ManiSkill

Bug: Seems that gpu simulation does not work when control_mode="pd_ee_delta_pose" #396