Sim-and-real trial results

You can find the simulation evaluation's per trial results at https://huggingface.co/datasets/xuanlinli17/simpler-env-eval-example-videos/tree/main . The real eval's per-trial results are split across different sources from different people who performed the experiments. If you really want them, you can send me an email at xul012@ucsd.edu (unfortunately I don't have full per trial videos from Google Robot operators, only example videos and numbers). However, note that SIMPLER averages success over different policy seeds (e.g., Octo) and different tuned robot arm colors (e.g., for google robot) to reduce the variance of evaluation, so it is not very meaningful to draw conclusion if you want to compare real and sim eval trial-by-trial (instead of comparing their average success). This is because policies often exhibit different error patterns (in both real and sim) under slight visual changes or seed changes even when robots & objects placed in the same position & orientation. On the other hand, the average successes are much more meaningful as they have much lower variance.

The "grasping" tasks in the Bridge evaluation are the defined as the "partial success" of the full Bridge tasks (e.g., for "put eggplant in yellow basket", the "grasping" task is successful as long as the robot grasps the eggplant, regardless of whether the robot has put the eggplant in the yellow basket). With that said, the trajectories for the "grasping" tasks are the same as the trajectories for the full Bridge tasks. Their raw success rates are in https://github.com/simpler-env/SimplerEnv/blob/main/tools/calc_metrics_evaluation_videos.py. You can also find the raw success rates in Appendix Table 4 and 5.

simpler-env / SimplerEnv

Sim-and-real trial results #8