octo-models / octo

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
https://octo-models.github.io/
MIT License
794 stars 156 forks source link

offline/sim evaluation recommendations #23

Open daniellawson9999 opened 9 months ago

daniellawson9999 commented 9 months ago

Excited by the work, great paper and open release.

I am interested in testing some ideas that will involve pretraining (e.g. architecture changes, etc.), likely without access to a real-world setup, at least at first. Just starting to look at the codebase.

Curious about recommendations for sim/offline evaluation. 1) For evaluation, any recs / best practices for separating datasets into train/test/validation, or holding out rt-x datasets. What seem to be the most useful proxies for real-world perf. 2) I saw there are provided examples for sim finetuning, are there any results that could be shared for simulated envs? Are there any sim envs that "work" for testing zero-shot eval in addition to finetuning?

Thanks!

mees commented 9 months ago

Thanks for your interest and for the great questions!

  1. In general, there is no single offline metric that directly correlates to real world performance, as there are multiple factors that could determine the success rate of a rollout, i.e. besides tracking end-effector position and orientation w.r.t to ground-truth the timing of closing the gripper can be quite critical. But we plot many metrics during training that should provide a good overview.
  2. We are actively looking into sim envs for exactly your use-case, stay tuned for some updates with the next release of Octo!