pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE
https://opentsne.rtfd.io
BSD 3-Clause "New" or "Revised" License
1.42k stars 157 forks source link

Bug: running optimize() multiple times produces different result compared to running it once #228

Closed dkobak closed 1 year ago

dkobak commented 1 year ago

Running optimize(n_iter=1) for 100 times should give the same result as optimize(n_iter=100), but it doesn't. Here is a reproducible example:

n = 100
np.random.seed(42)
X = np.random.randn(n, 2)

from openTSNE.affinity import PerplexityBasedNN
from openTSNE import TSNEEmbedding
from openTSNE.initialization import random as random_init

A = PerplexityBasedNN(X)
I = random_init(n, random_state=42)

E1 = TSNEEmbedding(I, A, random_state=42)
E2 = TSNEEmbedding(I, A, random_state=42)

E1.optimize(n_iter=100, inplace=True)

for i in range(100):
    E2.optimize(n_iter=1, inplace=True)

plt.figure(figsize=(4, 4), layout='constrained')
plt.scatter(E1[:,0], E1[:,1])
plt.scatter(E2[:,0], E2[:,1])

tmp-tsne-bug

Maybe it has something to do with how the gains are saved in between the optimize() calls?

dkobak commented 1 year ago

Hmm, does not look like it's due to gains: if I use 2 iterations instead of 100 in the snippet above, then E1 and E2 are still not identical, yet E1.optimizer.gains and E2.optimizer.gains are the same.

dkobak commented 1 year ago

So I figured out what is going on here. update vectors are not save between the optimize() calls, so momentum term has no effect if one uses optimize(n_iter=1). I guess it's debatable what is better, but I would say that if we keep the gains between the optimize() calls, then we should also keep the update vectors.

I prepared a quick PR that implements that but unfortunately it makes some tests fail, and I could not fix it yet.

pavlin-policar commented 1 year ago

Thanks for tracking this down. This is definitely a bug. Calling optimize once with iter=100 should definitely be the same as calling optimize 100 times with iter=1.