First of all, Thanks for all your contribution! :)
I looked at the original algorithm from the paper "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" for OpenES implementation.
They update the policy parameters theta after every iteration or rollouts.
But the implementation in es.py file under OpenES class has this line commented in ask function.
#self.mu += self.learning_rate * change_mu .
https://github.com/hardmaru/estool/blob/master/es.py#L328C1-L328C47
Even the Adam optimizer which is initialized doesn't change the self.mu array.
Just wanted to know if this a mistake or am I missing something here.
First of all, Thanks for all your contribution! :) I looked at the original algorithm from the paper "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" for OpenES implementation. They update the policy parameters theta after every iteration or rollouts. But the implementation in es.py file under OpenES class has this line commented in ask function.
#self.mu += self.learning_rate * change_mu
. https://github.com/hardmaru/estool/blob/master/es.py#L328C1-L328C47Even the Adam optimizer which is initialized doesn't change the self.mu array. Just wanted to know if this a mistake or am I missing something here.