pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE
https://opentsne.rtfd.io
BSD 3-Clause "New" or "Revised" License
1.48k stars 165 forks source link

random state in .optimize() #191

Closed ritagonmar closed 3 years ago

ritagonmar commented 3 years ago

Hi Pavlin,

I experienced another weird thing when using .optimize() . In the documentation, random_state appears as a parameter of the function, but when I pass a random state to it, I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<timed exec> in <module>

/usr/local/lib/python3.8/dist-packages/openTSNE/tsne.py in optimize(self, n_iter, inplace, propagate_exception, **gradient_descent_params)
    685             # Run gradient descent with the embedding optimizer so gains are
    686             # properly updated and kept
--> 687             error, embedding = embedding.optimizer(
    688                 embedding=embedding, P=self.affinities.P, **optim_params
    689             )

TypeError: __call__() got an unexpected keyword argument 'random_state'
Steps to reproduce the behavior

The code I used when encountering this problem is (same as in issue #190 but including the random_state in .optimize()):

I = initialization.pca(data, random_state=42)

E = TSNEEmbedding(I, A, n_jobs=-1, random_state=42, verbose=True)

n = data.shape[0]

# early exaggeration
E = E.optimize(n_iter=125, exaggeration=12, momentum=0.5, n_jobs=-1, learning_rate=n/12, random_state=42, verbose=True, callbacks=mycallback, callbacks_every_iters=50)

# exaggeration annealing
exs = np.linspace(12,1,125)
for i in range(125):
    if (i+1)%50 == 0:
        E = E.optimize(n_iter=1, exaggeration=exs[i], momentum=0.8, n_jobs=-1, learning_rate=n/12, random_state=42, verbose=True, callbacks=mycallback, callbacks_every_iters=1)
    else:
        E = E.optimize(n_iter=1, exaggeration=exs[i], momentum=0.8, n_jobs=-1, learning_rate=n/12, random_state=42, verbose=True)

# final optimization without exaggeration
E = E.optimize(n_iter=2000, exaggeration=1, momentum=0.8, n_jobs=-1, learning_rate=n/12, random_state=42, verbose=True, callbacks=mycallback, callbacks_every_iters=50)

Do you know if .optimize() should be able to take a random state or am I somehow misunderstanding the documentation?

ohickl commented 3 years ago

I am wondering the same. I assumed it might inherit the random state defined in TSNEEmbedding() at first, but re-running with the same input does not result in an identical output.

pavlin-policar commented 3 years ago

Thanks for reporting this. The documentation was incorrect here -- the optimization procedure is deterministic and doesn't require any random_state for reproducible results. This was a mistake on my part.

ohickl commented 3 years ago

You are right, the cause was something else. I was wrongly assuming openTSNE as the culprit, because i assumed it needed a randomization seed. Sorry for the false alarm!