Open KarlKaise opened 4 months ago
Hi, dear @KarlKaise
What is the scale of your data? It appears that MNMST consumes significantly more GPU memory than your computer's GPU capacity when initializing the affinity graph. As we analyzed in the supplementary materials, MNMST-GPU can execute on datasets with up to 30k cells. We recommend that you change the device to 'cpu' when executing Z_gpu = MNMST_representation_gpu(C_gpu, spatia_init_tensor, device=device)
, or execute torch.cuda.empty_cache()
once before running this code.
Hello,
This makes sense then. This is my dataset: AnnData object with n_obs × n_vars = 61702 × 30039 obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt' var: 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts' obsm: 'X_spatial', 'spatial' filtered out 15863 cells that have less than 50 counts filtered out 42 cells that have more than 2500 counts
filtered out 12406 genes that are detected in less than 10 cells
extracting highly variable genes --> added 'highly_variable', boolean vector (adata.var) 'highly_variable_rank', float vector (adata.var) 'means', float vector (adata.var) 'variances', float vector (adata.var) 'variances_norm', float vector (adata.var) normalizing counts per cell finished (0:00:00) AnnData object with n_obs × n_vars = 44974 × 3000 obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts' var: 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm' uns: 'hvg', 'log1p' obsm: 'X_spatial', 'spatial'
So makes sense 44000 is way more than 30000. I guess I would need to pre-select more stringently cells of interest. Thank you a lot for update and your time.
Hi, dear @KarlKaise You are welcome. If you want to execute MNMST-GPU on large-scale dataset, you can also try changing the torch data type to torch.float16. This will reduce the GPU memory requirement to some extent, but the trade-off is that the code will not run on the CPU, as CPUs usually do not support the float16 data format. For large-scale datasets, we recommend referring to algorithms like SpaGCN and GraphST, or STAGATE, which adopt sparsification for the adjacency graph. An extended version of MNMST-sparse will be released in the coming months. Let me know if you need any more help!
Thank you so much. I will give it a try. I think I can also just "crop" out regions that I am not interested in. Otherwise, it works really nice at least compared to BANKSY for example.
Hello,
I tried to execute the adapt jupyter code via slurm and got this error: filtered out 15863 cells that have less than 50 counts filtered out 12 cells that have more than 3000 counts filtered out 12388 genes that are detected in less than 10 cells extracting highly variable genes --> added 'highly_variable', boolean vector (adata.var) 'highly_variable_rank', float vector (adata.var) 'means', float vector (adata.var) 'variances', float vector (adata.var) 'variances_norm', float vector (adata.var) normalizing counts per cell finished (0:00:00) computing PCA with n_comps=15 finished (0:00:12) Traceback (most recent call last): File "/mnmst_slideseq2.py", line 190, in
Z_gpu = MNMST_representation_gpu(C_gpu, spatia_init_tensor, device=device)
File "/MNMST_gpu.py", line 105, in MNMST_representation_gpu
Z = torch.zeros([n, n]).to(device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.55 GiB (GPU 0; 47.45 GiB total capacity; 45.42 GiB already allocated; 1.82 GiB free; 45.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Any idea how I can resolve this issue?