jlmelville / uwot

An R package implementing the UMAP dimensionality reduction method.
https://jlmelville.github.io/uwot/
GNU General Public License v3.0
321 stars 31 forks source link

`fuzzy_simplicial_set()` for BBKNN #96

Closed ycli1995 closed 2 years ago

ycli1995 commented 2 years ago

Hello, James Previously I use the unexported function fuzzy_simplicial_set() from uwot in the bbknnR package. This function is quite helpful to retrieve the connectivity matrix from KNN results, which is similar to the implementation in BBKNN:

https://github.com/Teichlab/bbknn/blob/d2d5a65638008a4837261eaaa464223f8db36fef/bbknn/matrix.py#L64

However fuzzy_simplicial_set() is totally native for R users, which means you don't need any extra API from python to implement the basic function of BBKNN in R. And the connectivity matrix is necessary for the downstream UMAP and graph-based clustering.

I do appreciate it if you could kindly make fuzzy_simplicial_set() exported. Thank you.

Yuchen Li

jlmelville commented 2 years ago

This feature would also be handy to enable approaches like waveMAP and TopOMetry.

I assume your only use-case currently is in reproducing the linked line in BBKNN?

I don't want to go straight to exporting the function as-is, as it was not designed with public API use in mind, but I will endeavor to avoid breaking bbknnR.

ycli1995 commented 2 years ago

After some cursory exploration, I found that maybe the returned fgraph from uwot::umap() could be used to represent the connectivity graph which I need. I should spend some extra time to figure it out so that fuzzy_simplicial_set() would not be used directly in my package.

jlmelville commented 2 years ago

Yes that will work. But you will want to set n_epochs = 0 to avoid spending any time optimizing the low-dimensional coordinates you have no use for. Also, you will still have to pay the price of spending time initializing and storing the low-dimensional coordinates. But it will solve your immediate problem of CRAN not being happy with your package's use of an internal function (for which I am responsible for drawing their attention to and which I feel bad about, sorry).

Possibly I could allow a combination of n_epochs = 0, ret_extra = c("fgraph"), init = NULL, or perhaps more helpfully some wrapper around umap that effectively does the same. I will start work on this as soon as I have the time.

jlmelville commented 2 years ago

I have added a new function to uwot's public API: similarity_graph (NB: this is not available on CRAN yet, only via github).

@ycli1995 this would be a fairly straightforward conversion from your current use of fuzzy_simplicial_set. Where your compute_connectivities_umap function currently runs:

  cnts <- fuzzy_simplicial_set(
    nn = list(idx = knn_index, dist = knn_dist),
    set_op_mix_ratio = set_op_mix_ratio,
    local_connectivity = local_connectivity
  )

you would instead use:

  cnts <- similarity_graph(
    nn_method = list(idx = knn_index, dist = knn_dist),
    set_op_mix_ratio = set_op_mix_ratio,
    local_connectivity = local_connectivity
  )

So the only change required (apart from the function change) is that the nn parameter should be renamed nn_method.

On my local install of bbknnR, when I run the example in your vignette with panc8_small, the cnts matrix is identical whether I use fuzzy_simplicial_set or similarity_graph and the final images are identical, but obviously you should run your own tests when you are able to.

We can continue this discussion at https://github.com/ycli1995/bbknnR/issues/1 in terms of what the timing will be of new submissions to CRAN and other changes (e.g. minimum version of uwot required, removing the call to getFromNamespace in your package and so on).

jlmelville commented 2 years ago

Looks like bbknnR was able to successfully submit to CRAN so I will call this done.