yilin404 commented 4 days ago

System Info

- `lerobot` version: 0.1.0
- Platform: Linux-5.15.0-126-generic-x86_64-with-glibc2.35
- Python version: 3.10.15
- Huggingface_hub version: 0.26.2
- Dataset version: 3.1.0
- Numpy version: 1.26.3
- PyTorch version (GPU?): 2.2.1+cu118 (True)
- Cuda version: 11080
- Using GPU in script?: <fill in>
- opencv-contrib-python: 4.10.0.84
- opencv-python: 4.10.0.84

Information

[ ] One of the scripts in the examples/ folder of LeRobot
[X] My own task or dataset (give details below)

Reproduction

I appreciate your hard work on this project, and I hope to contribute by sharing an issue I've encountered. I encountered an issue where after importing modules from the lerobot library, the program hangs when calling cv2.imshow. The program doesn't proceed to execute subsequent code. Specifically, the program seems to stop responding during executing cv2.imshow and does not display any image.

infer_lerobot.py


#!/home/yilin/miniconda3/envs/robotics/bin/python3

import rospy

import cv2 import numpy as np import einops import torch

import time

from lerobot.common.policies.diffusion.modeling_diffusion import DiffusionPolicy

from lib.arm_driver_wrapper import ArmDriverWrapper, ArmDriverWrapperCfg

from lib.realsense_wrapper import RealSenseWrapper, RealSenseWrapperCfg

def main(): rospy.init_node("Eval_Lerobot_node", anonymous=True) rospy.loginfo("==> Start Eval Lerobot node...")

realsense_wrapper_cfg = RealSenseWrapperCfg(names=["camera_top", "camera_wrist"],
                                            sns=["238222073566", "238222071769"],
                                            color_shape=(640, 480), depth_shape=(640, 480),
                                            fps=60, timeout_ms=30)
realsense_wrapper = RealSenseWrapper(cfg=realsense_wrapper_cfg)

controlling = False

policy = DiffusionPolicy.from_pretrained(rospy.get_param("~pretrained_policy_path"))
policy_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
policy.reset()

while not rospy.is_shutdown():
    # 获取realsense图像
    realsense_wrapper.get_frames()

    # # 获取当前机械臂状态信息
    # qpos_curr, qvel_curr, torque_curr = driver_wrapper.get_arm_states()

    if controlling:
        with torch.no_grad():
            # 计算机器人动作
            observation = {}

            imgs = {"observation.images.camera_top": torch.from_numpy(realsense_wrapper.color_images[0]).to(policy_device),
                    "observation.images.camera_top": torch.from_numpy(realsense_wrapper.color_images[1]).to(policy_device),}
            for imgkey, img in imgs.items():
                # convert to channel first of type float32 in range [0,1]
                img = einops.rearrange(img.unsqueeze(0), "b h w c -> b c h w").contiguous()
                img = img.type(torch.float32)
                img /= 255

                observation[imgkey] = img

            observation["observation.state"] = torch.from_numpy(qpos_curr).to(policy_device).float().unsqueeze()

            with torch.inference_mode():
                action = policy.select_action(observation)
            action = action.cpu().numpy()

        # 控制机器人运动
        for i in range(len(action)):
            driver_wrapper.command_arm_joint_position(action[i, :-1])
            driver_wrapper.command_gripper_joint_position(action[i, -1:])

            time.sleep(1. / 30.)

    # 绘制并显示realsense图像, 捕获键盘输入
    cv2.imshow("realsense_image", cv2.hconcat(realsense_wrapper.color_images))
    key = cv2.waitKey(1) & 0xFF
    if key == ord('q'):
        driver_wrapper.command_arm_ee_pose(arm_ee_position=np.array([0., 0.2, 0.3], dtype=np.float32),
                                           arm_ee_quaternion=np.array([0.707, 0.707, 0., 0.], dtype=np.float32))
        time.sleep(5.)

        controlling = False
        print("==> End controlling...")
    elif key == ord('c'):
        driver_wrapper.command_arm_ee_pose(arm_ee_position=np.array([0., 0.2, 0.3], dtype=np.float32),
                                            arm_ee_quaternion=np.array([0.707, 0.707, 0., 0.], dtype=np.float32))
        time.sleep(5.)

        controlling = True
        print("==> Start controlling...")
    else:
        pass

if name == "main": main()

2.realsense_wrapper.py

import pyrealsense2 as rs

import numpy as np import open3d as o3d import time

from dataclasses import dataclass from typing import List

@dataclass class RealSenseWrapperCfg: names: List[str] sns: List[str]

color_shape: List[int]
depth_shape: List[int]

fps: int

timeout_ms: float

class RealSenseWrapper: def init(self, cfg: RealSenseWrapperCfg): super().init()

    print("==> RealSenseWrapper initial...")

    self.cfg = cfg

    ctx = rs.context()
    devices = ctx.query_devices()
    for dev in devices:
        dev.hardware_reset()

    self.pipes = []
    self.pipe_cfgs = []
    self.profiles = []

    # 创建对齐对象，默认对齐到彩色图像
    self.align_to = rs.stream.color  # 默认将深度对齐到彩色图
    self.align = rs.align(self.align_to)  # 创建对齐对象

    for sn in self.cfg.sns:
        pipe = rs.pipeline()
        self.pipes.append(pipe)

        pipe_cfg = rs.config()
        pipe_cfg.enable_device(sn)
        pipe_cfg.enable_stream(rs.stream.color, self.cfg.color_shape[0], self.cfg.color_shape[1], rs.format.bgr8, self.cfg.fps)
        pipe_cfg.enable_stream(rs.stream.depth, self.cfg.depth_shape[0], self.cfg.depth_shape[1], rs.format.z16, self.cfg.fps)
        self.pipe_cfgs.append(pipe_cfg)

        profile = pipe.start(pipe_cfg)
        self.profiles.append(profile)

    print("==> Waiting for Frames...")
    for _ in range(3):
        for name, pipe in zip(self.cfg.names, self.pipes):
            t = time.time()
            try:
                pipe.wait_for_frames()
                print(f"{name} waited {time.time() - t}s")
            except:
                print(f"{name} waited too long: {time.time() - t}s\n\n")
                raise Exception

    self.color_images = []
    self.depth_images = []
    self.point_clouds = []

    print("==> RealSenseWrapper initial successfully...")

def get_frames(self):
    self.color_images.clear(), self.depth_images.clear(), self.point_clouds.clear()

    for name, pipe in zip(self.cfg.names, self.pipes):
        try:
            frameset = pipe.wait_for_frames(timeout_ms=self.cfg.timeout_ms)
            aligned_frameset = self.align.process(frameset)  # 对齐深度图和彩色图
        except:
            print(f"==> {name} failed...")
            [pipe.stop() for pipe in self.pipes]
            break

        color_frame = aligned_frameset.get_color_frame()
        depth_frame = aligned_frameset.get_depth_frame()

        color_image = np.array(color_frame.get_data())
        depth_image = np.array(depth_frame.get_data())
        point_cloud = self._depth_to_point_cloud(depth_image, depth_frame.profile.as_video_stream_profile().intrinsics)

        self.color_images.append(color_image)
        self.depth_images.append(depth_image)
        self.point_clouds.append(point_cloud)

def _depth_to_point_cloud(self, depth_image, intrinsic):
    # Create Open3D Image from depth map
    o3d_depth = o3d.geometry.Image(depth_image)

    # Get intrinsic parameters
    fx, fy, cx, cy = intrinsic.fx, intrinsic.fy, intrinsic.ppx, intrinsic.ppy

    # Create Open3D PinholeCameraIntrinsic object
    o3d_intrinsic = o3d.camera.PinholeCameraIntrinsic(width=depth_image.shape[1], height=depth_image.shape[0], fx=fx, fy=fy, cx=cx, cy=cy)

    # Create Open3D PointCloud object from depth image and intrinsic parameters
    pcd = o3d.geometry.PointCloud.create_from_depth_image(o3d_depth, o3d_intrinsic)

    return np.asarray(pcd.points)


  3. run `infer_lerobot.py` and and you will find that the code hangs at cv2.imshow, unable to proceed, and no image will be displayed 

### Expected behavior

I hope the above program can run properly and capture keyboard input

Cadene commented 3 days ago

Unfortunately we have some issue with opencv. We included this in our tutorial:

For Linux only (not Mac), install extra dependencies for recording datasets:

conda install -y -c conda-forge ffmpeg
pip uninstall -y opencv-python
conda install -y -c conda-forge "opencv>=4.10.0"

yilin404 commented 3 days ago

@Cadene Thank you so much for the solution! It worked perfectly and successfully resolved my issue. I can now use cv2.imshow for visualization without any problems. 🎉

Your method is clear, effective, and incredibly helpful. I believe adding this solution to the project's README would be beneficial for other users who might face similar issues. It would make it easier for everyone to find and apply the fix. Thanks again for your support and assistance! 🙏

Cadene commented 2 days ago

@yilin404 please could you open a PR when you thing these lines of code would be best in the README? thanks!

huggingface / lerobot

Error with cv2.imshow #520

System Info

Information

Reproduction

from lib.arm_driver_wrapper import ArmDriverWrapper, ArmDriverWrapperCfg