Closed Chengwei94 closed 8 months ago
Looks like I can get it the connectivity matrix through umap(mnist, ret_extra = c("fgraph"))
You are correct that that the output of similarity_graph
is the same as running umap
with ret_extra = c("fgraph")
.
But the use case of calling similarity_graph
and then passing it to umap
and skipping all the computation is not something you can do at the moment. A workaround would be to use the k-nearest neighbors output:
sg_res <- similarity_graph(iris, ret_extra = "nn")
umap_res <- umap(X = NULL, nn_method = sg_res$nn)
This incurs the cost of similarity calculation and symmetrization, but that is quick compared to the nearest neighbor calculation itself.
Passing the result of similarity_graph
back into umap
seems like something that ought to be supported now that similarity_graph
exists, especially as it would allow users to use either a modified version of the fuzzy simplicial set or even a sparse similarity matrix created via an entirely different method outside of uwot
and then uwot
can just be used to optimize the approximate coordinates in the lower dimension. So @Chengwei94 if you don't mind I would like to leave this issue open to remind me to support this in the next version of uwot
.
This is not hard to implement, but the interface requires some thought: some questions to myself (or anyone with an interest in this): how should the user pass this to umap
? The X
parameter already assumes if its passed a sparse matrix that it's a distance matrix. X
in combination with a is_similarity_graph
parameter? Use nn_method
instead? An entirely new parameter (and with it the need for ever more complex validation of which parameters are allowed together and which ones get ignored if they are both set)? An entirely new function (probably safest). While we're here, should the type of symmetrization also be specified by the user (e.g. fuzzy set union for UMAP vs mean average in LargeVis)?
optimize_graph_layout
was added to uwot
which will do this.
Hi there,
I am trying out similarity_graph to compute the connectivities graph. I am using it to compute clustering(similar to scanpy workflow). However, how do I input this connectivities information into the umap, so I can skip the recomputation? Or is there a way to retreive the similarity_graph when doing the umap?