Bounds for action space

RussRobin commented 7 months ago

Hi, thank you for this great work.

In Tensorflow version of your dataset (maniskill_dataset_converted_externally_to_rlds, https://www.tensorflow.org/datasets/catalog/maniskill_dataset_converted_externally_to_rlds)

what is the upper and lower bounds for the 7 parameters in steps/action? Also, do these 7 parameters mean, where the gripper will be in the next step?

Great thanks in advance.

xuanlinli17 commented 7 months ago

Actually the dataset is converted to part of the "Open-X-embodiment" dataset. The action is unbounded. Action [:3] is xyz movement of tool-center point (tcp, or center between two Panda gripper fingers), action [3:6] is axis-angle movement of tcp, axis[6:7] is gripper open/close. 0.1 times xyz value maps to meters, and 0.1 times axis-angle maps to rotation at the same axis but norm in radians (due to the controller configuration below)

For meaning of action[:6], it actually corresponds to a controller config not yet included in the original ManiSkill2 implementation (agents/configs/panda/defaults.py), for the "base-frame conformity" with other Open-X-Embodiment datasets:

arm_pd_base_ee_delta_pose = PDEEPoseControllerConfig(
            self.arm_joint_names,
            -0.1,
            0.1,
            0.1,
            self.arm_stiffness,
            self.arm_damping,
            self.arm_force_limit,
            ee_link=self.ee_link_name,
            frame="base",
        )

Which means the action[:6] is the "target delta pose" of tcp with respect to the base frame of the robot. A caveat is that for ManiSkill2, the tcp target pose after applying the action is sapien.core.Pose(p=scale * action[:3], q=quaternion_multiplication(axangle2quat(scale * action[3:6]), robot_tcp_pose_wrt_base)) (where scale=0.1, robot_tcp_pose_wrt_base = robot.pose.inv() * robot.tcp.pose, and quaternion_multiplication(a, b) = sapien.core.Pose(q=a) * sapien.core.Pose(q=b)), while for other subsets of Open-X-Embodiment datasets it is mostly sapien.core.Pose(p=robot_tcp_pose_wrt_base.p + (potentially scaled) action[:3], q=quaternion_multiplication(euler2quat(*(potentially scaled action[3:6])), robot_tcp_pose_wrt_base.q)). This means that in the current tfds maniskill dataset, we don't decouple translation and rotation actions when calculating the new tcp pose, while for other parts of the open-x-embodiment dataset they decouple it.

RussRobin commented 7 months ago

Thank you for your timely and detailed reply! In RT-X, it is noted that, action[0:6] are tokenized by dividing each into 256 bins, which uniformly distribute along each dimensions of action[0:6]. To do so, one will need the upper and lower bound of each element in action[0:6]. May you please provide more info about this? Great thanks!

xuanlinli17 commented 7 months ago

RT-X didn't use Maniskill data to train (it used a subset of real-world data only). There is no upper and lower bound for ManiSkill action[:6], but they are typically within -1 to 1.

RussRobin commented 7 months ago

Thanks a lot for your help. Ill close the issue.

RussRobin commented 5 months ago

Hi @xuanlinli17 , I wonder if it is possible to:

train a model on tfds (rtx) version of maniskill and test it in Maniskill2. Currently, I'm using

camera_cfgs = dict(width=256, height=256)
env = gym.make(
        "PickCube-v0", 
        # num_envs=1,
        obs_mode="rgbd", 
        control_mode="pd_ee_delta_pose", 
        render_mode="human",
        camera_cfgs=camera_cfgs
    )
    action = np.array([control_signal['dx'],control_signal['dy'],control_signal['dz'],control_signal['droll'],control_signal['dpitch'],control_signal['dyaw'],control_signal['gripper']])
obs, _, terminated, truncated, _ = env.step(action)

My model outputs xyz, rpy, with the same format in tfds version. It performs very well on my split of tfds train/val set but is doing really bad in maniskill2 virtual environment. So i think this is because the configuration in my maniskill2 virtual env and tfds dataset are different. Is there any possible ways for me to reproduce the env in tfds dataset?

How can I generate datasets with same control mode in maniskill tfds version. i.e. can I reproduce tfds dataset generation process?

A lot of thanks!

StoneT2000 commented 5 months ago

what is this tfds dataset? If it's one of the ManiSkill2 datasets I can maybe recreate them for ManiSkill 3 for use

RussRobin commented 5 months ago

Thanks for your reply. I’m referring to Rex version: https://www.tensorflow.org/datasets/catalog/maniskill_dataset_converted_externally_to_rlds

I tried to automatically generate trajectories in maniskill3, and then convert it from joint positions to pd-ee-delta-pose, but the maniskill environment doesn’t support this conversion yet. Also, in this issue, it’s noted that : This means that in the current tfds maniskill dataset, we don't decouple translation and rotation actions when calculating the new tcp pose, while for other parts of the open-x-embodiment dataset they decouple it.

I wonder if there’s any ways to reproduce maniskill-rtx version. Or can I generate of-ee-delta-pose datasets in maniskill by myself, train on it, and test models in maniskill?

Great thanks!

StoneT2000 commented 5 months ago

As other people decouple them explicitly, maniskill has also shifted to decoupling them by default. I will get back to you on how to regenerate the datasets there later, do you specifically want the tfds format?

RussRobin commented 5 months ago

Thanks! Yes, I’m interest to have the same control mode in rtx-style training data and maniskill virtual environment: xyz, rpy and gripper signals are good for me. Thanks again for your help!

StoneT2000 commented 5 months ago

Do you know which frame of the xyz rpy signals are needed? We support a few options, I can prioritize trying to get the data for the one you need

RussRobin commented 5 months ago

Thanks! Sorry but what do you mean by frame? I’m interested in xyz rpy of end-effector, with all setting the same as pd-ee-delta-pose control mode, to do tasks in manidkills like picking up, moving charger, …

StoneT2000 commented 5 months ago

So end effector control has multiple ways of controlling XYZ and RPY.

Eg +z only and not x or y can lead to different results if you pick the robot base frame or the end effector frame.

See here for more details and videos of how they work: https://maniskill.readthedocs.io/en/latest/user_guide/concepts/controllers.html#pd-ee-end-effector-pose

RussRobin commented 5 months ago

I think root frame should be good for me. Great thanks!

StoneT2000 commented 4 months ago

The conversion tool is added back @RussRobin via #388 however the success rate is not guaranteed to be very high. For example a task like PegInsertionSide conversion success rate is about 40%. So you would have to generate a ton of demonstrations and then convert them.

I have also uploaded demos for various datasets in the pd joint position format so you can run the tool to convert it as done before. Tasks with highly precise manipulation will be difficult to convert.

I do have 1 other solution which is to use a hybrid action-free imitation learning + reinforcement learning approach to learn from the pd joint position demonstrations + use RL to learn pd delta ee pose controls. I will probably write some code for this approach (this uses our env state reset based online imitation learning solution called RFCL which can do action free learning)

haosulab / ManiSkill

Bounds for action space #252