There is only one small error that prevents ADAM from calculating the gradient_num_diff for batches with max_evals_grouped. ADAM only gives "fun, self._eps" as argument when calling gradient_num_diff. This leads to max_evals_grouped=None, which leads to max_evals_grouped=1. Therefore, regardless of the call to set max_evals_grouped, max_evals_grouped=1 will always apply for ADAM.
Summary
There is only one small error that prevents ADAM from calculating the gradient_num_diff for batches with max_evals_grouped. ADAM only gives "fun, self._eps" as argument when calling gradient_num_diff. This leads to max_evals_grouped=None, which leads to max_evals_grouped=1. Therefore, regardless of the call to set max_evals_grouped, max_evals_grouped=1 will always apply for ADAM.
Details and comments
This issue was discussed in #178