Multi-image support for Diffusion Policy

ruijie-he commented 4 months ago

I'd like to be able to perform Diffusion policy training and inference on multiple camera streams (e.g. "observation.camera_0" and "observation.camera_1" rather than just "observation.image")

Assuming that I have a real-world Push T dataset with multiple camera streams (e.g. manro99/pusht_5cam on HF), can you provide pointers on how I can update lerobot/common/policies/diffusion/modeling_diffusion.py and lerobot/common/policies/diffusion/configuration_diffusion.py to ingest multiple camera streams for training? Thanks!

alexander-soare commented 4 months ago

Hi @ruijie-he this is on our TODO list, but you are welcome to contribute it. The best example in code is modeling_act.py. The message on this PR also explains the current design: https://github.com/huggingface/lerobot/pull/149.

Can you please take a look at those two resources and let me know if you need any more information?

alexander-soare commented 4 months ago

FYI this is being tackled here https://github.com/huggingface/lerobot/pull/218

huggingface / lerobot

Multi-image support for Diffusion Policy #212