offline/sim evaluation recommendations

octo-models / octo

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.

MIT License

794 stars 156 forks source link

Excited by the work, great paper and open release.

I am interested in testing some ideas that will involve pretraining (e.g. architecture changes, etc.), likely without access to a real-world setup, at least at first. Just starting to look at the codebase.

Curious about recommendations for sim/offline evaluation. 1) For evaluation, any recs / best practices for separating datasets into train/test/validation, or holding out rt-x datasets. What seem to be the most useful proxies for real-world perf. 2) I saw there are provided examples for sim finetuning, are there any results that could be shared for simulated envs? Are there any sim envs that "work" for testing zero-shot eval in addition to finetuning?

Thanks!

octo-models / octo

offline/sim evaluation recommendations #23