leggedrobotics / rsl_rl

Fast and simple implementation of RL algorithms, designed to run fully on GPU.
Other
505 stars 156 forks source link

Performance drop when resumed training with the empirical normalization #17

Open BIGheadLL opened 9 months ago

BIGheadLL commented 9 months ago

Hi there,

We noticed a performance drop when we resumed training with OnPolicyRunner which applied empirical normalization in our env. Screenshot from 2023-11-20 11-36-32 There is a gap between the black line and blue one. Additionally, we found the model performance cannot increase without the empirical normalization (The green and orange ones).

Many thanks.

Mayankm96 commented 8 months ago

When you resume training, typically the episodes are "terminated" randomly to encourage a diverse set of sample collection. Otherwise PPO can get stuck in a local minima.

https://github.com/leggedrobotics/rsl_rl/blob/master/rsl_rl/runners/on_policy_runner.py#L67