simpler-env / SimplerEnv

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
https://simpler-env.github.io/
MIT License
137 stars 12 forks source link

How to change the camera view #2

Closed zwbx closed 2 weeks ago

zwbx commented 1 month ago

Thanks you for the great work. I am trying to change the camera view to see if Octo are robust to view changing. I ran task "widowx_spoon_on_towel" and modify the CameraConfig file, Then I encounter some bugs.

This is my script.

import simpler_env
from simpler_env.utils.env.observation_utils import get_image_from_maniskill2_obs_dict
import mediapy
import sapien.core as sapien
import numpy as np  
from simpler_env.policies.octo.octo_server_model import OctoServerInference
try:
    from simpler_env.policies.octo.octo_model import OctoInference
except ImportError as e:
    print("Octo is not correctly imported.")
    print(e)

model = OctoInference(model_type="octo-small", policy_setup="widowx_bridge", action_scale=1.0,)
task_name = "widowx_spoon_on_towel"  # @param ["google_robot_pick_coke_can", "google_robot_move_near", "google_robot_open_drawer", "google_robot_close_drawer", "widowx_spoon_on_towel", "widowx_carrot_on_plate", "widowx_stack_cube", "widowx_put_eggplant_in_basket"]

if 'env' in locals():
  print("Closing existing env")
  env.close()
  del env
env = simpler_env.make(task_name)
# Colab GPU does not supoort denoiser
sapien.render_config.rt_use_denoiser = False
obs, reset_info = env.reset()
instruction = env.get_language_instruction()
print("Reset info", reset_info)
print("Instruction", instruction)

frames = []
done, truncated = False, False
while not (done or truncated):
   # action[:3]: delta xyz; action[3:6]: delta rotation in axis-angle representation;
   # action[6:7]: gripper (the meaning of open / close depends on robot URDF)
  image = get_image_from_maniskill2_obs_dict(env, obs)
  raw_action, action = model.step(image, instruction)
  print(action["world_vector"])
  obs, reward, done, truncated, info = env.step(np.concatenate([action["world_vector"], action["rot_axangle"], action["gripper"]]),)
#  print(truncated)
  frames.append(image)

episode_stats = info.get('episode_stats', {})
print("Episode stats", episode_stats)
mediapy.show_video(frames, fps=10)

to change the camera view, I modify the 'p' and 'q' in file SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/agents/configs/widowx/defaults.py I have checked that in this env, '3rd_view_camera' will be used.

@property
def cameras(self):
    # Table width: about 36cm

    return [
        CameraConfig(
            uid="3rd_view_camera",  # the camera used for real evaluation
            p=[0.0, -0.16, 0.36],
            # this rotation allows simulation proxy table to align almost perfectly with real table for bridge_real_eval_1.png
            # when calling env.reset(options={'robot_init_options': {'init_xy': [0.147, 0.028], 'init_rot_quat': [0, 0, 0, 1]}})
            q=look_at([0, 0, 0], [1, 0.553, -1.085]).q,
            width=640,
            height=480,
            actor_uid="base_link",
            intrinsic=np.array(
                [[623.588, 0, 319.501], [0, 623.588, 239.545], [0, 0, 1]]
            ),  # logitech C920
        ),
    ]

But the output video show that the camera view does not change, the object's position change. So how to change camera view in a right way?

xuanlinli17 commented 1 month ago

oh, the camera view changes only apply to our "variant aggregation" evaluation setting. In our visual matching evaluation setting, because we use a fixed real-world image to overlay the background of simulation image observation, the sim image observation's background will remain the same. However, the object poses will change, because the foreground objects are rendered in the simulator (along w/ their physics).

In this case you need to manually create the environment in command-line instead of using simpler-env.make (see scripts/ for more details), and remove the overlay image config from command-line configs.

zwbx commented 1 month ago

thanks! have successfully implemented this by the following code.

env = gym.make('PutSpoonOnTableClothInScene-v0', obs_mode='rgbd', 
    robot='widowx', sim_freq=500, control_freq=5,
    control_mode='arm_pd_ee_target_delta_pose_align2_gripper_pd_joint_pos',
    max_episode_steps=60, scene_name='bridge_table_1_v1', camera_cfgs={"add_segmentation": True}
)

Screenshot 2024-05-27 at 17 49 58

I notice the background seems to be blank, is that correct?

xuanlinli17 commented 1 month ago

The background is a wall in a replicacad scene. This is correct.