openai / evolution-strategies-starter

Code for the paper "Evolution Strategies as a Scalable Alternative to Reinforcement Learning"
https://arxiv.org/abs/1703.03864
MIT License
1.56k stars 277 forks source link

Doesn't the gradient need to be rescaled with σ ? #24

Open pzdkn opened 3 years ago

pzdkn commented 3 years ago

According to the paper on page 3, Algorithm 2, the gradient in line 11 is rescaled by the standard deviation. However I can't see it in the code in: https://github.com/openai/evolution-strategies-starter/blob/master/es_distributed/es.py#L247

What is the rationale behind this ? That it is included into the LR ?

qwfy commented 3 years ago

Seems like it, brax does the same.

pzdkn commented 3 years ago

I see, I guess it doesn't matter much as it is just a constant scaling. Still, without the scaling, the estimator would be biased.

zxymark221 commented 3 years ago

🤝 I also noticed this issue today. Without dividing $\sigma$, the gradient estimation is around 100x smaller in magnitude (since sigma is usually at 0.01 magnitude). That explains why in the example configuration, the “stepsize: 0.01” is relatively larger. I am curious about the rationale as well.

pzdkn commented 3 years ago

@zxymark221 , I guess in the end it just gets absorbed into the learning rate, since sigma is constant. However, I think the gradient won't be an unbiased estimate anymore.