leggedrobotics / rsl_rl

Fast and simple implementation of RL algorithms, designed to run fully on GPU.
Other
645 stars 184 forks source link

Huge performance gaps between algorithms and master on base IsaacLab tasks #38

Closed vassil-atn closed 2 months ago

vassil-atn commented 2 months ago

Hi, First of all, thank you for the algorithm implementations! I've tried migrating from the master branch to the algorithms one (since it supports more algorithms than just PPO), and testing it on the flat terrain ANYmal locomotion env in Isaac Lab. For some reason there is a huge gap in performance between the master branch PPO and the algorithms branch PPO - the latter exhibits a much larger reward variance and does not learn to follow the velocity commands at all in the 300 update steps. I use the same environment for both and the LegacyRunner for the algorithms branch. I've uploaded two screenshots of the training curves - interestingly, they start off the same but then diverse after several update steps.

Has anyone encountered this issue before?

image image

vassil-atn commented 2 months ago

I actually found the issue - num_learning_epochs is not used as a parameter in the new version (so each batch is used once only for updating). If you modify the code and run it with 5 epochs (like the master branch PPO), it works fine: image

On that note, is there a particular reason for changing the behaviour to not allow training over multiple epochs?