Huge performance gaps between algorithms and master on base IsaacLab tasks

leggedrobotics / rsl_rl

Fast and simple implementation of RL algorithms, designed to run fully on GPU.

Other

645 stars 184 forks source link

Hi, First of all, thank you for the algorithm implementations! I've tried migrating from the master branch to the algorithms one (since it supports more algorithms than just PPO), and testing it on the flat terrain ANYmal locomotion env in Isaac Lab. For some reason there is a huge gap in performance between the master branch PPO and the algorithms branch PPO - the latter exhibits a much larger reward variance and does not learn to follow the velocity commands at all in the 300 update steps. I use the same environment for both and the LegacyRunner for the algorithms branch. I've uploaded two screenshots of the training curves - interestingly, they start off the same but then diverse after several update steps.

Has anyone encountered this issue before?

leggedrobotics / rsl_rl

Huge performance gaps between algorithms and master on base IsaacLab tasks #38