real-stanford / scalingup

[CoRL 2023] This repository contains data generation and training code for Scaling Up & Distilling Down
https://www.cs.columbia.edu/~huy/scalingup/
355 stars 22 forks source link

Real robot inference guidance #1

Closed YESAndy closed 1 year ago

YESAndy commented 1 year ago

Hi,

Thank you very much for sharing such great work! I wonder if you can share a guideline for real robot experiments. I have looked up the current docs but can't find any. Thank you ahead!

huy-ha commented 1 year ago

Hey Andy,

Thanks for your interest in our work!

Our real world robot code was minimally modified from diffusion policy. Since we just started from this script and slapped our policy in place of theirs, I don't think it justifies releasing their codebase again, but under my repo. I'll make sure to add this information to the doc!

wangyan-hlab commented 1 year ago

Hi Huy, @huy-ha

First, thank you for your kind help so far. I have been pushing the reproduction work forward and now I think I am ready to evaluate the policy on a real robot.

According to your guidance, I found the diffusion policy repo. There are 2 questions about it:

  1. In the policy folder, there are many diffusion policies. I would like to know which one exactly I should use? _(I guess it's diffusion_unet_hybrid_image_policy?)_ And do I need to edit the policy to fit your scalingup policy, or modify some other codes?
  2. In the eval_real_robot.py, it loads a checkpoint like this:
    # load checkpoint
    ckpt_path = input
    payload = torch.load(open(ckpt_path, 'rb'), pickle_module=dill)
    cfg = payload['cfg']
    cls = hydra.utils.get_class(cfg._target_)
    workspace = cls(cfg)
    workspace: BaseWorkspace
    workspace.load_payload(payload, exclude_keys=None, include_keys=None)

    but the checkpoint from my training seems to have a different structure and there isn't a 'cfg' key there, as well as other keys used in the script. Would you please give more information about how to modify the script to fix the issue?


Also, is it possible to load a model from the checkpoint and directly predict the action (i.e. eef position+eef uppermat+gripper command) with the REAL observation? I am trying to extract the code instantiating a diffusion policy model, provide a fake input, and hope to get some output:

import hydra
import torch
import numpy as np
from scalingup.data.dataset import StateTensor
from scalingup.data.window_dataset import StateSequenceTensor

def create_statetensor():

    device = 'cuda'

    shape = (1, 3)
    end_effector_position = np.random.rand(*shape)
    end_effector_position_tensor = torch.from_numpy(end_effector_position).float().to(device)

    print(end_effector_position)

    shape = (1, 9)
    end_effector_orientation = np.random.rand(*shape)
    end_effector_orientation_tensor = torch.from_numpy(end_effector_orientation).float().to(device)

    gripper_command = np.array([[1]])
    gripper_command_tensor = torch.from_numpy(gripper_command).float().to(device)

    shape = (1, 1, 3)
    input_xyz_pts = np.random.rand(*shape)
    input_xyz_pts_tensor = torch.from_numpy(input_xyz_pts).float().to(device)
    input_rgb_pts = np.random.rand(*shape)
    input_rgb_pts_tensor = torch.from_numpy(input_rgb_pts).float().to(device)

    occupancy_vol = np.array([[0]])
    occupancy_vol_tensor = torch.from_numpy(occupancy_vol).float().to(device)

    time = np.array([[0.25]])
    time_tensor = torch.from_numpy(time).float().to(device)

    shape = (1, 3, 160, 240)
    front_view = np.random.randint(0, 255, size=shape)
    front_view_tensor = torch.tensor(front_view, dtype=torch.uint8).to(device)
    topdown_view = np.random.randint(0, 255, size=shape)
    topdown_view_tensor = torch.tensor(topdown_view, dtype=torch.uint8).to(device)
    wrist_view = np.random.randint(0, 255, size=shape)
    wrist_view_tensor = torch.tensor(wrist_view, dtype=torch.uint8).to(device)
    views = {'front': front_view_tensor, 
            'top_down': topdown_view_tensor,
            'fr5/robotiq_2f85/d435i/rgb': wrist_view_tensor}

    state_tensor = StateTensor(
        end_effector_position=end_effector_position_tensor,
        end_effector_orientation=end_effector_orientation_tensor,
        gripper_command=gripper_command_tensor,
        input_xyz_pts=input_xyz_pts_tensor,
        input_rgb_pts=input_rgb_pts_tensor,
        occupancy_vol=occupancy_vol_tensor,
        time=time_tensor,
        views=views
    )

    return state_tensor

def create_input_data():
    sequence = []

    state_tensor_1 = create_statetensor()
    state_tensor_2 = create_statetensor()

    sequence.append(state_tensor_1)
    sequence.append(state_tensor_2)

    state_sequence_tensor = StateSequenceTensor(sequence=sequence)

    return state_sequence_tensor

@hydra.main(config_path="config/", config_name="inference", version_base="1.2")
def main(cfg):

    model = hydra.utils.instantiate(cfg.policy)
    print(type(model), model)

    with torch.no_grad():
        ###TRYING TO OUTPUT SOMETHING BUT HAVEN'T DONE
        input_data = create_input_data()
        # output = model(obs=input_data, task=..., seed=...)
        # print(output)
        output = model.get_stats(input_data)
        print(output)

if __name__ == "__main__":
    main()

But I haven't find a proper way to output something. Would you please give some suggestions?

Best Regards