utiasDSL / safe-control-gym

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
https://www.dynsyslab.org/safe-robot-learning/
MIT License
596 stars 126 forks source link

MSE computation issue #165

Open svsawant opened 1 week ago

svsawant commented 1 week ago

In the RL training pipeline (for SAC and PPO), during evaluation runs, there seems to be an issue with computed/tracked mse values. They neither match with mse in "info" from env.step nor with rmse results from directly policy evaluation through rl_experiment.sh (A deeper dive suggests issue in how mse is handled in "RecordEpisodeStatistics")

adamhall commented 1 week ago

Thanks @svsawant, do you have a simple example we could to take a look?

svsawant commented 1 week ago

To replicate, consider the following test case. Train an RL controller with a quadrotor and go through the logs. Then, execute the trained policy using rl_experiment.sh which again prints out the run stats. The mse values from training run (after taking a square root) are higher than rmse values printed in policy execution. Here's a test run I did for PPO with quadrotor (with attitude control interface). Screenshot from 2024-09-28 11-34-56 Next, the run stats from the policy evaluation Screenshot from 2024-09-28 11-40-29