pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE
https://opentsne.rtfd.io
BSD 3-Clause "New" or "Revised" License
1.48k stars 165 forks source link

Negative reported KL divergence for dof>1 #238

Closed dkobak closed 1 year ago

dkobak commented 1 year ago

I accidentally discovered a strange bug:

np.random.seed(42)
X = np.random.uniform(-1, 1, size=(1000, 2))
Z = TSNE(verbose=True, dof=100).fit(X)

Prints this:

--------------------------------------------------------------------------------
TSNE(dof=100, early_exaggeration=12, verbose=True)
--------------------------------------------------------------------------------
===> Finding 90 nearest neighbors using Annoy approximate search using euclidean distance...
   --> Time elapsed: 0.57 seconds
===> Calculating affinity matrix...
   --> Time elapsed: 0.03 seconds
===> Calculating PCA-based initialization...
   --> Time elapsed: 0.00 seconds
===> Running optimization with exaggeration=12.00, lr=83.33 for 250 iterations...
Iteration   50, KL divergence 1.1914, 50 iterations in 0.4213 sec
Iteration  100, KL divergence 1.3487, 50 iterations in 0.3907 sec
Iteration  150, KL divergence 1.3523, 50 iterations in 0.3916 sec
Iteration  200, KL divergence 1.3521, 50 iterations in 0.4010 sec
Iteration  250, KL divergence 1.3525, 50 iterations in 0.3748 sec
   --> Time elapsed: 1.98 seconds
===> Running optimization with exaggeration=1.00, lr=1000.00 for 500 iterations...
Iteration   50, KL divergence -0.8881, 50 iterations in 0.4196 sec
Iteration  100, KL divergence -0.9152, 50 iterations in 0.4037 sec
Iteration  150, KL divergence -0.9160, 50 iterations in 0.4205 sec
Iteration  200, KL divergence -0.9152, 50 iterations in 0.4224 sec
Iteration  250, KL divergence -0.9163, 50 iterations in 0.4239 sec
Iteration  300, KL divergence -0.9164, 50 iterations in 0.4296 sec
Iteration  350, KL divergence -0.9162, 50 iterations in 0.4356 sec
Iteration  400, KL divergence -0.9164, 50 iterations in 0.4249 sec
Iteration  450, KL divergence -0.9168, 50 iterations in 0.4197 sec
Iteration  500, KL divergence -0.9166, 50 iterations in 0.4186 sec
   --> Time elapsed: 4.22 seconds

The final embedding looks fine, but the KL divergence is negative. It also happens for some other values of dof but not for the default dof=1.

pavlin-policar commented 1 year ago

That's strange. Briefly going over the code, there doesn't seem to be any obvious reason as to why this is happening.

dkobak commented 1 year ago

Just to clarify: this does not always happen with dof=100. Does not happen with MNIST data, for example.

dkobak commented 1 year ago

Very weird issue... So far I have no idea what it can be due to. But I would like to bump it -- just stumbled upon it again, and it's really annoying. There was yet another Twitter discussion about embedding 2D manifolds and I keep saying that TSNE(dof=100) works better than TSNE() for it. It does: I checked the kNN recall, and it's higher with dof=100. But I cannot check the KL loss, because openTSNE gives me a negative value for dof=100.

dkobak commented 1 year ago

Hey, I found the bug!! Fixed it, and confirmed that it now works fine.