jlmelville / uwot

An R package implementing the UMAP dimensionality reduction method.
https://jlmelville.github.io/uwot/
GNU General Public License v3.0
321 stars 31 forks source link

Return UMAP graph? #47

Closed twhiscock closed 4 years ago

twhiscock commented 4 years ago

Hello! Thank you for writing such a useful package. It is great not to have to switch between python and R to use umap :).

I was wondering: is it possible to output the graph (i.e. the fuzzy simplicial set) that is an intermediate step in the UMAP projection?

In the original python implementation, I obtained this using the function:

umap.umap_.fuzzy_simplicial_set

I have found that this graph has several nice properties, and can be used to cluster data directly using graphical clustering methods.

Tom

jlmelville commented 4 years ago

Hello, right now the fuzzy simplicial set is not available for output. But it could be added as a new option in the next release.

twhiscock commented 4 years ago

OK, thanks! I think this would be a great feature for the next release :)

jlmelville commented 4 years ago

@twhiscock the github version of uwot now has the option to return the fuzzy simplicial set by using:

res <- umap(X, ret_extra = c("fgraph"))

The coordinates will be in res$embedding and the graph in res$fgraph. It's a sparse dgCMatrix from the Matrix package.

This will show up in the next CRAN version, whenever that is.

Note that the graph is further sparsified by dropping low-weight edges that would not be sampled during optimization. That is determined by the n_epochs parameter. If you only care about the graph and not the embedded coordinates, set n_epochs = 0 and no edges are removed.

The effect of the sparsifying is small with default values. I looked at n_epochs = 200 and n_epochs = 500 with MNIST (a largeish dataset) and iris (a very small one) and the number of edges that are dropped is always < 1%.

twhiscock commented 4 years ago

Awesome :)

jlmelville commented 4 years ago

uwot 0.1.8 is on CRAN with this feature available.