isaac-sim / OmniIsaacGymEnvs

Reinforcement Learning Environments for Omniverse Isaac Gym
Other
762 stars 203 forks source link

Testing in Headless True vs False has a 40% reward difference when using *the same trained policy*. Reproducible script provided! #167

Open Demetrio92 opened 1 month ago

Demetrio92 commented 1 month ago

Overview

While building a custom robotic simulation tool on top of OIGE we discovered that testing policies with headless=False was different from headless=True. The issue can be easily reproduced even on standard OIGE tasks. Testing the same trained policy with headless=True/False has a 40% reward difference on Humanoid and Ant tasks.

I am attaching a script that can be run on the latest commit in main, it trains the Humanoid task in headless=True, tests it in headless=True/False and should produce following results:

 == Humanoid Test; headless=True
av reward: 6852.170803435147 av steps: 989.174072265625

 == Humanoid Test; headless=False
av reward: 4273.75024558347 av steps: 984.9992679355784

gist to reproduce this: https://gist.github.com/Demetrio92/c986493cff3b4d791a42412179ec6264

This also happens to Ant. And if training is done with headless=False (very slow, but can be done) the test scores are entirely different. See extra outputs at the bottom of this post.

Root-Cause Analysis

Resolution

It would be great if you could confirm the issue, or explain if this behavior is expected and what is the proper way to deal with it? Currently it seems that visually inspecting a trained policy is unreliable as it behaves differently when rendered, which would be extremely undesirable as visual inspection is vital to debugging RL policies.

Extra Results

* Humanoid trained with `headless=True` ``` == Humanoid Test; headless=True av reward: 6852.170803435147 av steps: 989.174072265625 == Humanoid Test; headless=False av reward: 4273.75024558347 av steps: 984.9992679355784 == Humanoid Test; headless=True enable_cameras=True == av reward: 4273.75024558347 av steps: 984.9992679355784 ``` * Humanoid trained with `headless=False` (training takes 1.5h on RTX 3070) ``` == Humanoid Test; headless=True av reward: 4156.822625699561 av steps: 830.9899344569288 == Humanoid Test; headless=False av reward: 3556.779703811363 av steps: 966.6001461988304 == Humanoid Test; headless=True enable_cameras=True == av reward: 3556.779703811363 av steps: 966.6001461988304 ``` * Ant trained with `headless=True` ``` == Ant Test; headless=True av reward: 7147.375523806955 av steps: 965.1955620580346 == Ant Test; headless=False av reward: 3829.089754253626 av steps: 996.640625 == Ant Test; headless=True enable_cameras=True == av reward: 3829.089754253626 av steps: 996.640625 ``` On request we can also provide complete training and testing logs.