A student encountered an unusual situation with spectral initialization: due to the specifics of the dataset, there were points that were EXACTLY overlapping in the initialization. This made the points "stuck" to each other forever -- these points felt the same repulsive force to all other points and so could not separate from each other, even though there should actually be repulsion between them. Adding a tiny amount of random noise to the initialization solved this problem and made points spread over the embedding, as expected.
A student encountered an unusual situation with spectral initialization: due to the specifics of the dataset, there were points that were EXACTLY overlapping in the initialization. This made the points "stuck" to each other forever -- these points felt the same repulsive force to all other points and so could not separate from each other, even though there should actually be repulsion between them. Adding a tiny amount of random noise to the initialization solved this problem and made points spread over the embedding, as expected.
This reminded me of another issue we discussed a while ago https://github.com/pavlin-policar/openTSNE/issues/180 (still open) where points exactly overlapping in the initialization were causing some problems.
My suggestion is to always add a tiny amount of noise to all initializations that we compute. Specifically, in the
rescale()
function here https://github.com/pavlin-policar/openTSNE/blob/master/openTSNE/initialization.py#L9 I would replacewith
This would affect PCA and spectral init, but would not affect a user-provided init.