Open svsawant opened 1 week ago
Thanks @svsawant, do you have a simple example we could to take a look?
To replicate, consider the following test case. Train an RL controller with a quadrotor and go through the logs. Then, execute the trained policy using rl_experiment.sh which again prints out the run stats. The mse values from training run (after taking a square root) are higher than rmse values printed in policy execution. Here's a test run I did for PPO with quadrotor (with attitude control interface). Next, the run stats from the policy evaluation
In the RL training pipeline (for SAC and PPO), during evaluation runs, there seems to be an issue with computed/tracked mse values. They neither match with mse in "info" from env.step nor with rmse results from directly policy evaluation through rl_experiment.sh (A deeper dive suggests issue in how mse is handled in "RecordEpisodeStatistics")