martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
109 stars 14 forks source link

Preprocessing: killed running scdrs compute-score #90

Closed schroeme closed 3 months ago

schroeme commented 3 months ago

Hello, I'm running scdrs compute-score per the instructions here, and I get the following error:

******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.2
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file adata.h5ad \
--h5ad-species human \
--cov-file None \
--gs-file magma_scz_top1000_zscore.gs \
--gs-species human \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data True \
--flag-raw-count False \
--n-ctrl 1000 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score True \
--out-folder marm_out

Loading data:
--h5ad-file loaded: n_cell=881832, n_gene=22582 (sys_time=218.2s)
First 3 cells: ['ATCTTCACAAGGCTTT-1', 'TCAAGCACATACTGAC-1', 'CAACCTCGTCCTACAA-1']
First 5 genes: ['LOC118152095', 'SLITRK6', 'LOC118152108', 'LOC103791423', 'SLITRK5']
--gs-file loaded: n_trait=1 (sys_time=218.2s)
Print info for first 3 traits:
First 3 elements for 'SCZ': ['DNAH10', 'DDX55', 'SNRNP35'], [8.2586, 8.0577, 7.845]

Preprocessing:
Killed

My data is already log1p-transformed on normalized counts. Any idea what might be causing this error? Is the file size too large, or am I violiting any input requirements (I couldn't find any)?

Thanks!

martinjzhang commented 3 months ago

Hi,

The most likely reason is your file is too large. I once applied scDRS to a dataset with 500K and it needed 96G of memory. I suggest:

schroeme commented 3 months ago

Hi @martinjzhang, thanks so much for the quick response! I have 256G RAM on my computer, so I should be able to allocate 128G easily. Is the memory allocation something I can change when calling compute-score? If not, how can I change it? Thanks!

martinjzhang commented 3 months ago

Hi,

Allocating RAM to software should be done in the OS instead of within scDRS. Can you check to see if your system already allocate all memories to this program or it sets an upper limit?

Also, did you store your single-cell data in sparse format? Specifically, is adata.X a sparse matrix? If not, converting it to a sparse matrix will further save memories.

schroeme commented 3 months ago

Sparsifying adata.X (converting to raw counts) worked! Thank you!