Closed erbon7 closed 2 years ago
Hi, @erbon7
500 Gb RAM is much bigger than your data really needs, so I don't think it is a memory issue.
Could you check if you can run FindNeighbors
from pca
and apca
?
Another minor point. Since you have 170+ proteins, when you run ScaleData
, you would better set do.scale
to FALSE
.
DefaultAssay(CD3) <- 'ADT'
VariableFeatures(CD3) <- rownames(CD3[["ADT"]])
CD3 <- NormalizeData(CD3, normalization.method = 'CLR', margin = 2) %>% ScaleData(do.scale = FALSE) %>% RunPCA(reduction.name = 'apca')
Hi @yuhanH
Thanks for your message.
I checked the clustering with FindNeighbors
and it runs fine for pca
but with apca
I have the same error message.
Code used:
CD3 <- FindNeighbors(CD3, reduction = 'apca', dims = 1:10, verbose=TRUE)
I also tried with the option nn.method = 'annoy'
but the problem remains the same.
Error message:
Computing nearest neighbor graph
Computing SNN
Error in ComputeSNN(nn_ranked = nn.ranked, prune = prune.SNN) :
std::bad_alloc
Calls: FindNeighbors ... FindNeighbors -> FindNeighbors.default -> ComputeSNN
Execution halted
Hi @erbon7
I think the issue is from the apca
dimensional reduction.
Could you please check if there are NA or nan value in the cell embeddings of apca
?
You can also have a try to run UMAP from apca
.
DefaultAssay(CD3) <- 'ADT'
CD3 <- RunUMAP(CD3, reduction = 'apca', dims = 1:18, reduction.name = 'adt.umap', reduction.key = 'Uadt_')
If there is something wrong in the cell embeddings of apca
, you need to check the ADT counts and data matrices.
Hi @yuhanH Thanks for your suggestions. It seems that there are no NAs or nan values in the cell embeddings:
> Embeddings(CD3, reduction="apca") %>% is.na() %>% sum
[1] 0
> Embeddings(CD3, reduction="apca") %>% is.nan() %>% sum
[1] 0
UMAP is running fine:
> DefaultAssay(CD3) <- 'ADT'
> CD3 <- RunUMAP(CD3, reduction = 'apca', dims = 1:18, reduction.name = 'adt.umap', reduction.key = 'Uadt_')
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
14:40:20 UMAP embedding parameters a = 0.9922 b = 1.112
14:40:20 Read 116593 rows and found 18 numeric columns
14:40:20 Using Annoy for neighbor search, n_neighbors = 30
14:40:20 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
14:40:30 Writing NN index file to temp file /tmp/RtmpYLrII8/file1cfb45b53950
14:40:30 Searching Annoy index using 1 thread, search_k = 3000
14:40:59 Annoy recall = 19.72%
14:41:00 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
14:41:00 93631 smooth knn distance failures
14:41:04 Found 2 connected components, falling back to 'spca' initialization with init_sdev = 1
14:41:04 Using 'irlba' for PCA
14:41:04 PCA: 2 components explained 96.42% variance
14:41:04 Scaling init to sdev = 1
14:41:04 Commencing optimization for 200 epochs, with 6617896 positive edges
Using method 'umap'
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
14:52:01 Optimization finished
>
So it means that the nearest-neighbor procedure implemented in RunUMAP (via uwot package, if I'm not mistaken) is running fine with this dataset, right ?
Hi @erbon7 It is wired that adt PCA embeddings can be used for UMAP, but fail for FindNeighbors. Would you mind to send a reproducible example? yhao@nygenome.org
Turns out after investigation of the dataset by @yuhanH that the error message was caused by lots of "empty" cells in the ADT (proteins) assay. After removing the empty cells the FindMultiModalNeighbors
function is working fine.
Empty cells removal snippet code:
CD3 <- subset(CD3, subset = nCount_ADT > 100)
Thanks @yuhanH for your help on this case. I now close this bug report.
Hi Seurat devs,
I'm trying to analyze CITE-seq (RNA+ADT) data on a rather large dataset, but I'm having a memory error message when trying to compute the WNN graph.
here is the code:
The error message I get is:
I tried on large mem machines (RAM > 500 Gb) but the error is the same. If I monitor the memory used, the process is crashing with max 34 Gb used. I tried with different versions/installations of R but the problem is the same. If I'm running the same code on a smaller object (bmcite object from SeuratData) the program is not crashing.
session info: