pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE
https://opentsne.rtfd.io
BSD 3-Clause "New" or "Revised" License
1.42k stars 157 forks source link

Question on initialization #252

Closed sbembenek18 closed 8 months ago

sbembenek18 commented 8 months ago

The default initialization is PCA -- is that correct? So, is it using the top 50 PCs for the TSNE embedding? If I wanted to just run my data as is -- what initialization would allow for this?

thanks!

pavlin-policar commented 8 months ago

That's right -- the default initialization is PCA. However, t-SNE embeds data into 2D, so we here take the top 2 principal components of the data matrix, and use that as the initialization for the embedding. However, this refers only to the starting positions of the points in the 2D embedding, not to the actual input to the t-SNE algorithm. openTSNE uses the full data matrix, so if you want to do any preprocessing, e.g., taking only the top 50 PCs and using that, you'll have to do this yourself.

So, to answer your question, if you want to construct a t-SNE embedding for your data as is, openTSNE does this by default.

sbembenek18 commented 8 months ago

OK. So, given a data matrix with features, openTSNE, as it's default initialization, calculates the PCs, then takes only the top 2 PCs for initialization. After initialization, the full data matrix with the original (non PCs) features is used to perform the embedding.

If I actually wanted to use e.g., the first 50 PCs as my features as input for the embedding, I would simply calculate this ahead of time and pass this to openTSNE. And to avoid having openTSNE calculate the PCs again, I would (as you showed in '04_large_data_sets') initialize with:

init = openTSNE.initialization.rescale(X[:, :2])

and then use:

openTSNE.TSNE(initialization=init...)`

To be sure, the parameter n_components is the dimension of the embedding space for tSNE, and your PCA initialization has to use this same number of PCs as well.

Is this correct?

Thanks!

pavlin-policar commented 8 months ago

That's all spot on!

sbembenek18 commented 8 months ago

Thanks!