vikashplus / robohive

A unified framework for robot learning
https://sites.google.com/view/robohive
Apache License 2.0
489 stars 82 forks source link

Replaying data from Roboset FK1-v4(human) dataset with FK1_RelaxFixed-v4 environment. #124

Open omeryagmurlu opened 9 months ago

omeryagmurlu commented 9 months ago

Hello,

I'm trying to replay the Roboset FK1-v4(human) dataset and I'm facing problems with the new v4 kitchen environment. I'm able to replay the training data using the kitchen_relax-v1 environment from Relay Policy Learning, but am unable to replay them using the FK1_RelaxFixed-v4 environment. The arm moves seemingly random instead of following the trajectory from the data. Here are the code snippets with the functioning kitchen_relax-v1 replay code and the non-functioning FK1_RelaxFixed-v4 one. Thank you very much!

Apart from that, do you happen to have an estimated date for when the other multi task suites apart from the kitchen will release? Thanks!

FK1_RelaxFixed-v4, not working:

import torch
import h5py
import numpy as np
import gym
import time
from tqdm import tqdm
import robohive

torch.cuda.empty_cache()

trace = '/SNIP/datasets/human_demos_playdata/FK1_RelaxFixed_v2d-v4_60_20230506-111653_trace.h5'
with h5py.File(trace, 'r') as file:
    h = dict()
    # kettle to top left, bottom stove, right slider, left cupboard
    for key in file['Trial60'].keys():
        if key == 'env_infos':
            h['qpos'] = file['Trial60/env_infos/state/qpos'][()]
            h['qvel'] = file['Trial60/env_infos/state/qvel'][()]
            continue
        h[key] = file['Trial60'][key][()]

    print('loaded 60')

actions = h['actions']
qpos = h['qpos'][0]
qvel = h['qvel'][0]

speedup = 1

env_name = 'FK1_RelaxFixed-v4'
# env_name = 'kitchen-v2'
env = gym.make(env_name)

env.reset()
init_qpos = qpos.copy()
init_qvel = qvel.copy()
env.sim.data.qpos[:] = init_qpos
env.sim.data.qvel[:] = init_qvel
env.sim.forward()

# pick scaling for actions
act_mid = np.zeros(env.sim.model.nu)
act_amp = 2 * np.ones(env.sim.model.nu)

env.mj_render()

obs = env.get_obs()
for i in tqdm(range(actions.shape[0] - 1)):
    ctrl = actions[i]

    # act = ctrl
    act = act_mid + ctrl * act_amp
    next_obs, reward, done, env_info = env.step(act)

    # if i % render_skip == 0:
    env.mj_render()
    time.sleep(env.dt / speedup)

    obs = next_obs
    if done:
        break

env.close()

kitchen_relax-v1, working:

import torch
import h5py
import numpy as np
import gym
import time
from tqdm import tqdm
import adept_envs.franka
torch.cuda.empty_cache()

trace = '/SNIP/datasets/human_demos_playdata/FK1_RelaxFixed_v2d-v4_60_20230506-111653_trace.h5'
with h5py.File(trace, 'r') as file:
    h = dict()
    # put kettle on top left, bottom stove, right slider, left cupboard
    for key in file['Trial60'].keys():
        if key == 'env_infos':
            h['qpos'] = file['Trial60/env_infos/state/qpos'][()]
            h['qvel'] = file['Trial60/env_infos/state/qvel'][()]
            continue
        h[key] = file['Trial60'][key][()]

    print('loaded 60')

actions = h['actions']
qpos = h['qpos'][0]
qvel = h['qvel'][0]

speedup = 1

env = gym.make('kitchen_relax-v1')

env.reset()
init_qpos = qpos.copy()
init_qvel = qvel.copy()
env.sim.data.qpos[:init_qpos.shape[0]] = init_qpos
env.sim.data.qvel[:init_qvel.shape[0]] = init_qvel
env.sim.forward()

env.mj_render()

print(f'act_mid: {env.act_mid}, {env.act_mid.shape}\nact_amp: {env.act_amp}, {env.act_amp.shape}\nskip: {env.skip}\nframe_skip: {env.frame_skip}\nmodel.opt.timestep: {env.model.opt.timestep}\n')

for i in tqdm(range(actions.shape[0] - 1)):
    act = actions[i]

    observation, reward, done, info = env.step(act)
    env.mj_render()
    time.sleep((env.model.opt.timestep * env.frame_skip) / speedup)
    if done:
        break

env.close()
gaoyuezhou commented 9 months ago

Thank you for your question. We are taking a look at this issue and will post updates here.

Does this issue occur for other expert or human (e.g. human_demos_by_task) datasets or only the human play datasets? Any additional context you can provide would be very helpful. Thank you for your patience!

omeryagmurlu commented 8 months ago

Hello,

Thank you for your response. I've tried replaying the FK1_Knob1OnRandom-v4 dataset from human_demos_by_task, and had the same issue with it. I've also tried replaying the DAPG(human)/door_v2d-v4 dataset with its corresponding environment and this seemed to work without any problems. Here's a recording showing the Trace0 from the play dataset with the FK1_RelaxFixed-v4 env using the code snippet I've provided in my first post:

Screencast from 2023-12-15 13-40-00.webm

Thank you for your help.

gaoyuezhou commented 8 months ago

Hi,

Thank you for the info. For replaying RoboHive datasets, you should be able to directly use the recorded 'actions' as actions as in here instead of scaling them as you did in the script you provided in your first post. Additionally, we have provided a script for replaying the datasets. The following command should replay the dataset successfully for FK1_Knob1OnRandom-v4 datasets:

python logger/examine_logs.py -e FK1_Knob1OnRandom_v2d-v4 -p <path to your dataset file>/FK1_Knob1OnRandom_v2d-v4_0_20230529-204609_trace.h5

Let us know if this solves the issue. Thanks!

rgong-bdai commented 2 months ago

Hi, does the script work for playdata?

for example, FK1_RelaxFixed_v2d-v4_0_20230506-110624_trace