Open ollieeknight opened 4 months ago
Hi, The problem arises from the initialisation phase of umap, which is "spectral" by default. You don't have option to change this as far I understand Seurat code. This is not a problem of memory size as the data passed to uwot::umap originate from the reduction slot, here "harmony_rna" (the data matrix is from "Read 342124 rows and found 30 numeric columns"). I don't know this transform. If you have chance to perform a PCA or sparse PCA instead, this is what I would try. My two cents, Samuel
thanks for your response, I appreciate it. I'm also trying to figure out which package update has caused this issue, as I previously had it working fine for even larger datasets.
running this, it now seems to have worked well:
alldata_umap <- uwot::umap2(alldata[['harmony_rna']]@cell.embeddings,
n_neighbors = 30, n_components = 2,
metric = 'cosine', min_dist = 0.3, ret_model = T,
n_threads = 32, verbose = T)
alldata[['umap_rna']] <- CreateDimReducObject(embeddings = alldata_umap$embedding,
key = 'harmonyrna_',
assay = 'RNA')
alldata[['umap_rna']]@misc$model <- alldata_umap
although I can't seem to figure out why it isn't working in Seurat. I'll open an bug issue there, unless you have any suggestions first!
Great to know that you are able to dive into the code.
@SamGG you can read more about umap2
at https://jlmelville.github.io/uwot/articles/umap2.html.
@ollieeknight can you try repeating the call to umap2
that completed but also add verbose = TRUE
? Then repeat this but use the umap
function and see if it segfaults? At any rate the seeing the output for both cases might be helpful.
@jlmelville thanks picking this up, your input here is really appreciated, as well as your constant development on this package. thanks @SamGG also for your advice.
Here is the output of umap2
:
11:22:24 Using HNSW for nearest neighbor search
11:22:24 UMAP embedding parameters a = 0.9922 b = 1.112
11:22:24 Read 342124 rows and found 50 numeric columns
11:22:24 Building HNSW index with metric 'cosine' ef = 200 M = 16 using 32 threads
11:22:30 Finished building index
11:22:30 Searching HNSW index with ef = 30 and 32 threads
11:22:31 Finished searching
11:22:33 Commencing smooth kNN distance calibration using 32 threads with target n_neighbors = 30
11:22:44 Initializing from normalized Laplacian + noise (using RSpectra)
11:27:00 Range-scaling initial input columns to 0-10
11:27:04 Commencing optimization for 500 epochs, with 15578248 positive edges
Using method 'umap'
Optimizing with Adam alpha = 1 beta1 = 0.5 beta2 = 0.9 eps = 1e-07
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
umap
still aborts, but umap2
runs to completion and is successful.
The first output says:
Read 342124 rows and found 30 numeric columns
Your second output with umap2
says:
Read 342124 rows and found 50 numeric columns
so there's something different there. @ollieeknight can you confirm that when you run with umap
it also reads 50 numeric columns?
Based on the current output, umap2
is using HNSW for nearest neighbor search, rather than annoy
so it could be that using HNSW is giving a slightly different input graph to RSpectra. Unfortunately, without looking at the data directly it will be very hard for me to diagnose what's happening. Either the neighbor graph is causing a problem for RSpectra or due to a failure of the nearest neighbor search, the input data is somehow completely wrong and uwot needs to detect this and stop.
You could try running umap
with nn_method = "hnsw"
or in the case of using Annoy, increase one or both of search_k
and n_trees
to see if a better neighbor graph helps, but I don't know if those parameters are exposed in Seurat.
I'm running into an issue where Seurat keeps crashing (segmentation fault) when trying to run UMAP on a relatively large seurat object:
Could you help me troubleshoot? Any advice would be really great. To make sure it's not a memory issue, the last environment I tested this in had 64cores and 600gb RAM.
For the sake of privacy for my work, I hid the names of the layers, but there is indeed 25 of them and the harmony integration (saved as 'harmony_rna') worked perfectly
the error message I get is (cut off 1., it's a bunch of numbers, I assume some kind of output matrix)