jlmelville / uwot

An R package implementing the UMAP dimensionality reduction method.
https://jlmelville.github.io/uwot/
GNU General Public License v3.0
321 stars 31 forks source link

How to input similarity_graph back into umap parameters? #98

Closed Chengwei94 closed 8 months ago

Chengwei94 commented 2 years ago

Hi there,

I am trying out similarity_graph to compute the connectivities graph. I am using it to compute clustering(similar to scanpy workflow). However, how do I input this connectivities information into the umap, so I can skip the recomputation? Or is there a way to retreive the similarity_graph when doing the umap?

Chengwei94 commented 2 years ago

Looks like I can get it the connectivity matrix through umap(mnist, ret_extra = c("fgraph"))

jlmelville commented 2 years ago

You are correct that that the output of similarity_graph is the same as running umap with ret_extra = c("fgraph").

But the use case of calling similarity_graph and then passing it to umap and skipping all the computation is not something you can do at the moment. A workaround would be to use the k-nearest neighbors output:

sg_res <- similarity_graph(iris, ret_extra = "nn")
umap_res <- umap(X = NULL, nn_method = sg_res$nn)

This incurs the cost of similarity calculation and symmetrization, but that is quick compared to the nearest neighbor calculation itself.

Passing the result of similarity_graph back into umap seems like something that ought to be supported now that similarity_graph exists, especially as it would allow users to use either a modified version of the fuzzy simplicial set or even a sparse similarity matrix created via an entirely different method outside of uwot and then uwot can just be used to optimize the approximate coordinates in the lower dimension. So @Chengwei94 if you don't mind I would like to leave this issue open to remind me to support this in the next version of uwot.

This is not hard to implement, but the interface requires some thought: some questions to myself (or anyone with an interest in this): how should the user pass this to umap? The X parameter already assumes if its passed a sparse matrix that it's a distance matrix. X in combination with a is_similarity_graph parameter? Use nn_method instead? An entirely new parameter (and with it the need for ever more complex validation of which parameters are allowed together and which ones get ignored if they are both set)? An entirely new function (probably safest). While we're here, should the type of symmetrization also be specified by the user (e.g. fuzzy set union for UMAP vs mean average in LargeVis)?

jlmelville commented 8 months ago

optimize_graph_layout was added to uwot which will do this.