simpler-env / SimplerEnv

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
https://simpler-env.github.io/
MIT License
136 stars 12 forks source link

Sim-and-real trial results #8

Open abadithela opened 2 days ago

abadithela commented 2 days ago

Hi,

I'm wondering if you can make available the sim-and-real results for each experiment trial? If this data already available, could you please point me to it?

So far, I have found the real and simulated success rates for each task in the file metrics.py. However, this also does not seem to have the success rates for the grasping tasks (shown in Fig. 7). Where can this be found?

For the variants experiments, real success rates for variations of a task (e.g., pick_coke_can when placed horizontally/vertically/standing) is given in the file calc_metrics_evaluation_videos.py.

xuanlinli17 commented 2 days ago

You can find the simulation evaluation's per trial results at https://huggingface.co/datasets/xuanlinli17/simpler-env-eval-example-videos/tree/main . The real eval's per-trial results are split across different sources from different people who performed the experiments. If you really want them, you can send me an email at xul012@ucsd.edu (unfortunately I don't have full per trial videos from Google Robot operators, only example videos and numbers). However, note that SIMPLER averages success over different policy seeds (e.g., Octo) and different tuned robot arm colors (e.g., for google robot) to reduce the variance of evaluation, so it is not very meaningful to draw conclusion if you want to compare real and sim eval trial-by-trial (instead of comparing their average success). This is because policies often exhibit different error patterns (in both real and sim) under slight visual changes or seed changes even when robots & objects placed in the same position & orientation. On the other hand, the average successes are much more meaningful as they have much lower variance.

The "grasping" tasks in the Bridge evaluation are the defined as the "partial success" of the full Bridge tasks (e.g., for "put eggplant in yellow basket", the "grasping" task is successful as long as the robot grasps the eggplant, regardless of whether the robot has put the eggplant in the yellow basket). With that said, the trajectories for the "grasping" tasks are the same as the trajectories for the full Bridge tasks. Their raw success rates are in https://github.com/simpler-env/SimplerEnv/blob/main/tools/calc_metrics_evaluation_videos.py. You can also find the raw success rates in Appendix Table 4 and 5.