martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
109 stars 14 forks source link

compute-score is not reading .gs file properly #72

Closed juliaapolonio closed 11 months ago

juliaapolonio commented 11 months ago

Hi, Thank you for your amazing work on this tool!

I am trying to run compute-score with the .gs file generated from my custom gene set of two traits, but it reads the data as blank.

This was my call scdrs compute-score --h5ad-file data/local.h5ad --h5ad-species human --gs-file data/ad_rg_munge.gs --gs-species human --out-folder results

This is my log: `**

Loading data: --h5ad-file loaded: n_cell=1496, n_gene=16761 (sys_time=2.8s) First 3 cells: ['GGTAATTGTTATTGCC-L8XR_211007_02_F03-1135448413', 'CGTGTCTTCTTCGTAT-L8TX_210513_01_A10-1153814239', 'TTCCACGTCCCATTTA-L8TX_201023_01_F08-1142430227'] First 5 genes: ['ENSG00000000003', 'ENSG00000000419', 'ENSG00000000457', 'ENSG00000000460', 'ENSG00000001036'] --gs-file loaded: n_trait=2 (sys_time=2.9s) Print info for first 3 traits: First 3 elements for 'AD': [], [] First 3 elements for 'RG': [], []

Preprocessing:

Computing scDRS score: trait=AD: skipped due to small size (n_gene=0, sys_time=4.0s) trait=RG: skipped due to small size (n_gene=0, sys_time=4.0s)`

I installed scDRS according to this link. I also tested this call with my single cell file and your magma_10kb_top1000_zscore.74_traits.rv1.gs file and it parses as blank the same way

I am attaching my .gs file as a .txt.

Thank you in advance and my apologies if this is a trivial issue, I am new to bioinformatics. ad_munge.txt

martinjzhang commented 11 months ago

Hi @juliaapolonio ,

  1. Can you try to see if you can reproduce our tutorial: https://martinjzhang.github.io/scDRS/notebooks/quickstart.html

  2. The gene names in your .h5ad file are ENSG IDs, but the gene names in your .gs file are gene symbols. So they are not matched. I suggest changing gene names in your .h5ad file to gene symbols.

  3. Your .gs file looks fine to me. I am not sure why scDRS didn't read in any gene information.

Maybe you can address 1 and 2 first, and I can help you debug with your updates.

juliaapolonio commented 11 months ago

Hi @martinjzhang,

Thank you for your answer, it worked after changing ENSG IDs to gene symbols!

Best regards, Julia