huggingface / lerobot

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Apache License 2.0
7.65k stars 736 forks source link

[Feature request] Is it possible to add images to videos in a stream, and avoid converting them to png first? #360

Open StarCycle opened 3 months ago

StarCycle commented 3 months ago

Hello @Cadene ,

When transfering other dataset formats to LeRobot dataset, the current approach is actually first converting all images into png, and then converting all pngs into mp4 video with the following function:

https://github.com/huggingface/lerobot/blob/fab037f78d63ba578fffb25548eb37093cf1d7a7/lerobot/common/datasets/video_utils.py#L165-L190

It directly calls then ffmpeg command. It's fine for converting an exist dataset to LeRobot dataset. However, if you are recording a dataset in real time, it's a little complicated.

Another approach is to add images to videos in a stream, like link:

An example is:

import imageio.v2 as iio
import numpy as np

# All images must be of the same size
image1 = np.stack([iio.imread('imageio:camera.png')] * 3, 2)
image2 = iio.imread('imageio:astronaut.png')
image3 = iio.imread('imageio:immunohistochemistry.png')

w = iio.get_writer('my_video.mp4', format='FFMPEG', mode='I', fps=1,
                       codec='h264_vaapi',
                       output_params=['-vaapi_device',
                                      '/dev/dri/renderD128',
                                      '-vf',
                                      'format=gray|nv12,hwupload'],
                       pixelformat='vaapi_vld')
w.append_data(image1)
w.append_data(image2)
w.append_data(image3)
w.close()

Do you prefer this approach? Or there are some special reasons to first convert all images into png?

cc @Nirviaje

Cadene commented 3 months ago

Interesting :)

Could you try your approach during data recording on a real robot?

See this PR: https://github.com/huggingface/lerobot/pull/326

Record an episode:

python lerobot/scripts/control_robot.py record \
    --fps 30 \
    --root data \
    --repo-id $USER/koch_pick_place_lego \
    --num-episodes 50 \
    --warmup-time-s 2 \
    --episode-time-s 30 \
    --reset-time-s 10

We currently use threads to write images asychronously: https://github.com/huggingface/lerobot/blob/f179084709967fd237795df118c756901ab4535a/lerobot/scripts/control_robot.py#L413 https://github.com/huggingface/lerobot/blob/f179084709967fd237795df118c756901ab4535a/lerobot/scripts/control_robot.py#L435-L436

Then we encode video frames at the end on the recording: https://github.com/huggingface/lerobot/blob/f179084709967fd237795df118c756901ab4535a/lerobot/scripts/control_robot.py#L593

This ensures high fps during teleoperation of your robot.

Your method could be better, but we need to try!

Importantly, you need to use the exact encoding that we are using (see the default parameters with vcodec="libsvtav1": https://github.com/huggingface/lerobot/blob/f179084709967fd237795df118c756901ab4535a/lerobot/common/datasets/video_utils.py#L165-L211