Closed igaloly closed 1 year ago
The purpose of the spectral initialization is a bit different from a random projection. The spectral initialization has the advantage of:
Some possible downsides of using a random projection:
At any rate, it's easy to try this with sklearn:
import sklearn.random_projection
# assume your data is in the variable x
# probably also want to set random_seed
embedder = sklearn.random_projection.SparseRandomProjection(
n_components=2
)
init_coords = embedder.fit_transform(x)
You can then run UMAP with init=init_coords
to use the random projection input as initialization. In my experience, it didn't give results that seemed particularly great. I think I would prefer truncated SVD for this kind of initialization unless speed really was an issue (these gave results I preferred the look of but I did no quantitative evaluation).
One way you could use random projections would be if you had very high dimensional data and wanted to reduce the dimensionality initially before applying UMAP. Shah and Silwal looked at this (in the context of t-SNE), but found that PCA did a better job, so truncated SVD would again be preferred for that. Oh well.
With the guarantees of the JL lemma, the increased speed of the "random projection", and because the objective of the Spectral embedding is to make the reduction deterministic. Might it be worth using the JL lemma instead of Spectral embedding?