Closed Lualululu closed 3 months ago
Hi,
We recommend using scDRS with genesets containing >=10 genes. Applying scDRS to smaller genesets is probably fine. But the results need to be interpreted with caution.
There is a way to work around this check.
./bin/scdrs
in your local scDRS folder if len(gene_list) < 10:
print(
"trait=%s: skipped due to small size (n_gene=%d, sys_time=%0.1fs)"
% (trait, len(gene_list), time.time() - sys_start_time)
)
continue
Then scDRS will skip this geneset size check.
Hello,
I'm encountering an issue with using a custom .gs file for disease-related SNP analysis. My workflow involves generating a .gs file from disease-related SNP sites, and this particular file includes only 6 genes. However, when I attempt to compute scores using this file, I receive a message indicating that the gene set is too small.
`Call: scdrs compute-score \ --h5ad-file scRNA_32.h5ad \ --h5ad-species human \ --cov-file None \ --gs-file out_file.gs \ --gs-species human \ --ctrl-match-opt mean_var \ --weight-opt vs \ --adj-prop None \ --flag-filter-data True \ --flag-raw-count True \ --n-ctrl 1000 \ --flag-return-ctrl-raw-score False \ --flag-return-ctrl-norm-score True \ --out-folder out
Loading data: --h5ad-file loaded: n_cell=184706, n_gene=23748 (sys_time=56.7s) First 3 cells: ['210203_A00268_0605_BHWCMWDSXY_AAACCCAAGAGGCCAT-1', '210203_A00268_0605_BHWCMWDSXY_AAACCCAAGATGCGAC-1', '210203_A00268_0605_BHWCMWDSXY_AAACCCAAGCTCTATG-1'] First 5 genes: ['AL627309.1', 'AL627309.5', 'LINC01409', 'FAM87B', 'LINC01128'] --gs-file loaded: n_trait=1 (sys_time=56.9s) Print info for first 3 traits: First 3 elements for 'z_score': ['HLA-B', 'ERAP1', 'KIFAP3'], [5.4383, 5.069, 4.8556]
Preprocessing:
Computing scDRS score: trait=z_score: skipped due to small size (n_gene=5, sys_time=116.1s) `
I am wondering if there is a minimum gene set size requirement for the analysis to proceed? And if so, is there any workaround or recommendation for cases where the gene set naturally contains a small number of genes due to the specificity of the disease-related SNP sites being analyzed?
Any insights or suggestions on how to proceed with such small gene sets would be greatly appreciated.
Thank you for your time and assistance.