Performance drop when resumed training with the empirical normalization

leggedrobotics / rsl_rl

Fast and simple implementation of RL algorithms, designed to run fully on GPU.

Other

645 stars 184 forks source link

Performance drop when resumed training with the empirical normalization #17

Open BIGheadLL opened 1 year ago

BIGheadLL commented 1 year ago

Hi there,

We noticed a performance drop when we resumed training with OnPolicyRunner which applied empirical normalization in our env. Screenshot from 2023-11-20 11-36-32 There is a gap between the black line and blue one. Additionally, we found the model performance cannot increase without the empirical normalization (The green and orange ones).

Many thanks.

Mayankm96 commented 11 months ago

When you resume training, typically the episodes are "terminated" randomly to encourage a diverse set of sample collection. Otherwise PPO can get stuck in a local minima.

https://github.com/leggedrobotics/rsl_rl/blob/master/rsl_rl/runners/on_policy_runner.py#L67