haosulab / ManiSkill

SAPIEN Manipulation Skill Framework, a GPU parallelized robotics simulator and benchmark
https://maniskill.ai/
Apache License 2.0
869 stars 156 forks source link

[Enhancement] Make control_mode pd_ee_pose for target pose control #488

Closed sean1295 closed 1 month ago

sean1295 commented 2 months ago

I saw in the road map document that one of the goal is to benchmark different BC models, including 3d diffusion policy.

In the original diffusion policy paper, the author recommends using position control, rather than velocity control (i.e.) delta position control), of end-effector pose, as it helps with their task completion performance. 스크린샷 2024-08-07 164436

In the controllers guide in the library, (https://github.com/haosulab/ManiSkill/blob/e4cb238e5b5eee78bb7175988222fe071381c91d/docs/source/user_guide/concepts/controllers.md), I found this sentence that says 'At the moment we only support fast GPU accelerated delta end-effector control. Support for GPU accelerated absolute end-effector control requires some much more complicated code but can be possible if one uses cuRobo to generate motion plans given target poses.'

Could you elaborate a bit on what the difficulty is for implementing absolute end-effector control? I initially thought you should be able to generate the target pose as long as you have the target joint pos. It would be a great addition to have the target eef pose control as it has clear benefits for some BC algorithms.

Also, on a tangent, I am also curious the reason for selecting angle-axis representation for orientation control. Are there particular benefits to doing this than using quaternion?

Thanks for all the great work!

StoneT2000 commented 2 months ago

Thanks for the kind comment!

For GPU simulation, trying to go to a specific target may require multiple steps. Currently cuRobo is i think the only GPU parallelized library for doing this kind of motion planning. For delta targets / control, computing a jacobian is simple and easy and a few simple/low overhead libraries exist to do that. I did not add cuRobo as it is a bit heavy to use and I did not find it easy to integrate / not an expert on tuning it.

In CPU simulation, computing a motion plan to go to a specific pose is possible, our code does that already (create a pd ee pose controller that has delta=False in the config).This is sufficient for those testing diffusion policy since you do not need GPU simulation for evaluation, you can just use the CPU sim. If you want faster evaluation (e.g. leverage parallel rendering for faster image obs generation) then you need the GPU sim but you can probably hardcode a for loop using a CPU based IK solver to determine the low-level control action given target EE pose.

Feel free to add that control option for the franka arm, i can merge it.

sean1295 commented 2 months ago

Thanks for the prompt response.

I am a bit confused about your answer on CPU vs GPU simulation. Didn't fast_kinematics do IK for gpu environment also? Or I guess with your latest update you started using pytorch_kinematics. So I guess my question is what is preventing from setting config.use_delta as true and use gpu?

StoneT2000 commented 2 months ago

Oh you can use GPU sim + delta end effector control (use delta is True). That is certainly possible.

With use delta false that would require a different algorithm that I haven't investigated deeply. My understanding is diffusion policy papers definition of "position control" is absolute pose control?

sean1295 commented 2 months ago

Oh my bad, I meant that if you set config.use_delta as false. My confusion comes from my assumption that pose_absolute = pose_robot * pose_delta, making calculating pose_absolute trivial. Is this correct?

StoneT2000 commented 1 month ago

@sean1295 late update but I recently figured out the right way to do it. It is not as simple as pose_absolute = pose_robot * pose_delta to compute the delta action. On GPU, we currently use a fast approximation of delta IK controller, but that does not work well the larger the delta action it is. Moreover in the real world this kind of absolute pose controller to my knowledge typically uses something like the CLIK algorithm instead of the pseudo inverse trick we use at the moment for delta IK.

In the scenario where use_delta is False, we must assume the target pose in GPU sim might not be close by at all, and must a iterative optimization algorithm to compute the target joint positions in that setting.

I have an implementation I am testing now in this PR #536 which is adding GPU parallelized real2sim environments from a project called SIMPLER, which happens to require the pd_ee_target_pose controller (this is what one of the real robots uses). Will comment here and close the issue once merged and share an example.

StoneT2000 commented 1 month ago

You can test this new GPU based IK controller on panda, there's a new pd_ee_pose controller which accepts absolute poses of the end effector (in the robot base frame at the moment)

sean1295 commented 7 hours ago

Awesome! Thanks for the update! I have just played around with that and have lingering questions regarding the pose itself. Following was the code I ran, which is to basically track the pose of the cubeA in the stacking task.

import gymnasium as gym
import mani_skill.envs
import roma
import imageio
num_envs = 1 # you can go up to 4096 on better GPUs
env = gym.make("StackCube-v1", num_envs=num_envs, obs_mode = 'state_dict', control_mode = 'pd_ee_pose', render_mode = 'rgb_array')
imgs = []
obs, _ = env.reset(seed=0)
done = False
for _ in range(30):
    pos = env.cubeA.pose.p - env.agent.robot.pose.p
    ori = roma.unitquat_to_rotvec(env.cubeA.pose.q[:, [1,2,3,0]])
    gripper = torch.ones((1,1)).to(env.device)
    action = torch.cat((pos, ori, gripper), dim = -1)
    obs, rew, terminated, truncated, info = env.step(action)
    done = (terminated | truncated).any() # stop if any environment terminates/truncates
    imgs.append(env.render()[0].cpu().numpy())
imageio.mimwrite('video.mp4', imgs)

https://github.com/user-attachments/assets/be85696a-6193-4062-83b3-c78ea70c0982

This was based on my assumption that it is the absolute pose wrt robot base. (since robot base has identity quaternion). I think I get the accurate position tracking which is good, but am a bit confused which orientation action should be used. Your help would be much appreciated.

StoneT2000 commented 1 hour ago

For quaternions/orientation you should just start with the current orientation as the action and modify it. The orientation of the end effector TCP (what IK is controlling) is not 0,0,0.

A simple (not optimized) solution is this:

from transforms3d import euler euler.quat2euler(env.agent.tcp.pose.q.cpu().numpy())