satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.3k stars 918 forks source link

Add uwot.approx_pow option to RunUMAP for better reproducibility #9449

Open daskelly opened 3 weeks ago

daskelly commented 3 weeks ago

Dear Seurat Team,

The issue of UMAP reproducibility using the uwot method has been raised before https://github.com/satijalab/seurat/issues/6722, and as noted in that PR this was discussed in a uwot issue https://github.com/jlmelville/uwot/issues/46. Specifically the observation is that OS differences in the underlying implementation of some C/C++ libraries causes very small numerical differences OSs that produce different UMAP results. An option that corrects this was added to uwot version 0.1.8 as discussed in https://github.com/jlmelville/uwot/issues/46. Given that Seurat requires uwot >0.1.10, I thought it would be useful to include this option in Seurat's RunUMAP functions.

I tested this option on two systems (Mac and CentOS). Without the new uwot.approx_pow option (setting it to FALSE) the UMAPs are different, but when setting uwot.approx_pow = TRUE the results are identical. I am not sure why https://github.com/satijalab/seurat/issues/6722 noted that approx_pow did not solve the issue -- it does solve the problem for my example. See details below for how I tested this using a simple UMAP on pbmc_small:

Running on Mac laptop

```r install_github('daskelly/seurat@umap-approx-pow') suppressPackageStartupMessages(library(Seurat)) Sys.info()['sysname'] ``` ```text ## sysname ## "Darwin" ``` ```r RunUMAP(pbmc_small, verbose = FALSE, dims = 1:5, uwot.approx_pow = FALSE) |> Embeddings(object = _, 'umap') |> head(3) ``` ```text ## umap_1 umap_2 ## ATGCCAGAACGACT 4.692863 1.759652 ## CATGGCCTGTGCAT 5.494108 1.453728 ## GAACCTGATGAACC 2.188469 -5.069190 ``` ```r RunUMAP(pbmc_small, verbose = FALSE, dims = 1:5, uwot.approx_pow = TRUE) |> Embeddings(object = _, 'umap') |> head(3) ``` ```text ## umap_1 umap_2 ## ATGCCAGAACGACT 0.1691852 0.2062277 ## CATGGCCTGTGCAT 1.4661827 0.7792036 ## GAACCTGATGAACC 4.8212224 0.0872156 ```

Running on CentOS

```r install_github('daskelly/seurat@umap-approx-pow') suppressPackageStartupMessages(library(Seurat)) Sys.info()['sysname'] ``` ```text ## sysname ## "Linux" ``` ```r RunUMAP(pbmc_small, verbose = FALSE, dims = 1:5, uwot.approx_pow = FALSE) |> Embeddings(object = _, 'umap') |> head(3) ``` ```text ## umap_1 umap_2 ## ATGCCAGAACGACT 2.765658 3.131156 ## CATGGCCTGTGCAT 2.465478 2.309039 ## GAACCTGATGAACC 2.971336 -2.502957 ``` ```r RunUMAP(pbmc_small, verbose = FALSE, dims = 1:5, uwot.approx_pow = TRUE) |> Embeddings(object = _, 'umap') |> head(3) ``` ```text ## umap_1 umap_2 ## ATGCCAGAACGACT 0.1691852 0.2062277 ## CATGGCCTGTGCAT 1.4661827 0.7792036 ## GAACCTGATGAACC 4.8212224 0.0872156 ```

The results above show that the UMAP results are identical across OSs when uwot.approx_pow = TRUE only.