Closed geotyper closed 5 years ago
rT = (reward[:self.batch_size] - reward[self.batch_size:]) change_mu = np.dot(rT, epsilon) self.optimizer.stepsize = self.learning_rate update_ratio = self.optimizer.update(-change_mu) # adam, rmsprop, momentum, etc. #self.mu += (change_mu * self.learning_rate) # normal SGD method
so change_mu will be half shorter than need for pop_size
I found my error
so change_mu will be half shorter than need for pop_size