Closed dkobak closed 1 year ago
Hmm, does not look like it's due to gains: if I use 2 iterations instead of 100 in the snippet above, then E1 and E2 are still not identical, yet E1.optimizer.gains and E2.optimizer.gains are the same.
So I figured out what is going on here. update
vectors are not save between the optimize()
calls, so momentum term has no effect if one uses optimize(n_iter=1)
. I guess it's debatable what is better, but I would say that if we keep the gains between the optimize()
calls, then we should also keep the update
vectors.
I prepared a quick PR that implements that but unfortunately it makes some tests fail, and I could not fix it yet.
Thanks for tracking this down. This is definitely a bug. Calling optimize once with iter=100 should definitely be the same as calling optimize 100 times with iter=1.
Running
optimize(n_iter=1)
for 100 times should give the same result asoptimize(n_iter=100)
, but it doesn't. Here is a reproducible example:Maybe it has something to do with how the gains are saved in between the
optimize()
calls?