pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE
https://opentsne.rtfd.io
BSD 3-Clause "New" or "Revised" License
1.48k stars 165 forks source link

Default learning rate in .optimize() #190

Closed ritagonmar closed 3 years ago

ritagonmar commented 3 years ago

Hi Pavlin,

So I have notice something weird about the default learning rate when using .optimize(). In the documentation of .optimize() it says the following:

learning_rate (Union[str, float]) – The learning rate for t-SNE optimization. When learning_rate="auto" the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019.

In my case, the default learning rate should be n/12, since it is larger than 200. I ran it without specifying the learning rate (assuming it would automatically use n/12) and got a weird embedding. I reran it specifying learning_rate=n/12 and got a different embedding, that seemed to make more sense.

I think maybe there is a problem with the default learning rate and it doesn't use max(200, n/12).

Steps to reproduce the behavior

I realised about this using this piece of code:

I = initialization.pca(data, random_state=42)

E = TSNEEmbedding(I, A, n_jobs=-1, random_state=42, verbose=True)

# early exaggeration
E = E.optimize(n_iter=125, exaggeration=12, momentum=0.5, n_jobs=-1, verbose=True, callbacks=mycallback, callbacks_every_iters=50)

# exaggeration annealing
exs = np.linspace(12,1,125)
for i in range(125):
    if (i+1)%50 == 0:
        E = E.optimize(n_iter=1, exaggeration=exs[i], momentum=0.8, n_jobs=-1, verbose=True, callbacks=mycallback, callbacks_every_iters=1)
    else:
        E = E.optimize(n_iter=1, exaggeration=exs[i], momentum=0.8, n_jobs=-1, verbose=True)

# final optimization without exaggeration
E = E.optimize(n_iter=2000, exaggeration=1, momentum=0.8, n_jobs=-1, verbose=True, callbacks=mycallback, callbacks_every_iters=50)

In the version that outputted the correct embedding I just specified the learning rate like this:

I = initialization.pca(data, random_state=42)

E = TSNEEmbedding(I, A, n_jobs=-1, random_state=42, verbose=True)

n = data.shape[0]

# early exaggeration
E = E.optimize(n_iter=125, exaggeration=12, momentum=0.5, n_jobs=-1, learning_rate=n/12, verbose=True, callbacks=mycallback, callbacks_every_iters=50)

# exaggeration annealing
exs = np.linspace(12,1,125)
for i in range(125):
    if (i+1)%50 == 0:
        E = E.optimize(n_iter=1, exaggeration=exs[i], momentum=0.8, n_jobs=-1, learning_rate=n/12, verbose=True, callbacks=mycallback, callbacks_every_iters=1)
    else:
        E = E.optimize(n_iter=1, exaggeration=exs[i], momentum=0.8, n_jobs=-1, learning_rate=n/12, verbose=True)

# final optimization without exaggeration
E = E.optimize(n_iter=2000, exaggeration=1, momentum=0.8, n_jobs=-1, learning_rate=n/12, verbose=True, callbacks=mycallback, callbacks_every_iters=50)
pavlin-policar commented 3 years ago

Hmm, I can't reproduce this. I've generated some random data and used the standard perplexity affinity kernel

np.random.seed(0)
data = np.random.normal(0, 10, size=(5000, 10))
A = affinity.PerplexityBasedNN(data)

and I get identical results for both runs. And the logs show it's called with lr=416.67 in both cases. Could you please send me the data you're using so I can debug this further, or -- if the data isn't public -- find another example where this doesn't work.

ritagonmar commented 3 years ago

I tried again and I can't reproduce it either. I must have done something weird back then that I cannot recall. Sorry about that!