Open BIGheadLL opened 1 year ago
When you resume training, typically the episodes are "terminated" randomly to encourage a diverse set of sample collection. Otherwise PPO can get stuck in a local minima.
https://github.com/leggedrobotics/rsl_rl/blob/master/rsl_rl/runners/on_policy_runner.py#L67
Hi there,
We noticed a performance drop when we resumed training with OnPolicyRunner which applied empirical normalization in our env. There is a gap between the black line and blue one. Additionally, we found the model performance cannot increase without the empirical normalization (The green and orange ones).
Many thanks.