Closed whikwon closed 5 years ago
I believe this code is directly adapted from the reference implemention cc @eugenevinitsky
This is a mistake; thank you for catching it. The reference code used SGD as you pointed out, but Evolutionary Strategies used ADAM and that wound up getting copied over. I will push a fix shortly.
@whikwon PR for this will be up shortly
System information
Describe the problem
In the ars.py code, the model use Adam optimizer for training. When I looked into the paper, SGD was used for training.
Is there any reason or experimental result for using Adam rather than SGD?
Source code / logs