Instead of training the whole model let's optimize only a small randomly selected model part (e.g. 5% of connections) at each optimization step:
class OpenES:
...
def ask(self):
...
self.epsilon *= np.random.choice([0, 1], size=self.epsilon.shape, p=[0.95, 0.05]) # add this line
self.solutions = self.mu.reshape(1, self.num_params) + self.epsilon * self.sigma
return self.solutions
With this simple modification, I got 99%/98.5% of accuracy on training/test sets (see the training log in my fork).
I don't have a good explanation for the phenomenon, but it looks like this method makes the exploitation component of the algorithm stronger.
Of course, this can't be used as a general approach, it's rather just an interesting (but probably specific to task/model/hyperparameters etc.) observation.
Instead of training the whole model let's optimize only a small randomly selected model part (e.g. 5% of connections) at each optimization step:
With this simple modification, I got 99%/98.5% of accuracy on training/test sets (see the training log in my fork).
I don't have a good explanation for the phenomenon, but it looks like this method makes the exploitation component of the algorithm stronger.
Of course, this can't be used as a general approach, it's rather just an interesting (but probably specific to task/model/hyperparameters etc.) observation.