Closed HelloWorldLTY closed 3 weeks ago
Please use the PASS_Alzheimers_Jansen2019.full_score.gz
file instead of the PASS_Alzheimers_Jansen2019.score.gz
file as input for perform-downstream
After running the code from "data/gs_file/magma_10kb_top1000_zscore.74_traits.rv1.gs", I did not generate "PASS_Alzheimers_Jansen2019.full_score.gz". Did I miss anything?
If I only subset the alzeimers related score, this is the log:
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.2
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file ./integrated_data.h5ad \
--h5ad-species human \
--cov-file None \
--gs-file data/gs_file/alzeh.gs \
--gs-species human \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data True \
--flag-raw-count True \
--n-ctrl 1000 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score True \
--out-folder data/
Loading data:
--h5ad-file loaded: n_cell=92596, n_gene=15890 (sys_time=17.1s)
First 3 cells: ['AAACAGCCAACATAAG-1-0-0-0-0-0-0-0', 'AAACAGCCAACTAACT-1-0-0-0-0-0-0-0', 'AAACAGCCAAGCCAGA-1-0-0-0-0-0-0-0']
First 5 genes: ['A1bg', 'A1bg-as1', 'A2m', 'A2ml1', 'A2ml1-as1']
--gs-file loaded: n_trait=0 (sys_time=17.1s)
Print info for first 3 traits:
Preprocessing:
Computing scDRS score:
Your run didn't read in the gs
file. can you double check?
df_gs = pd.read_csv("data/gs_file/magma_10kb_top1000_zscore.74_traits.rv1.gs", sep="\t", index_col=0)
df_gs.loc['PASS_Alzheimers_Jansen2019'].to_csv('data/gs_file/alzeh.gs', index=False, header=False, sep='\t')
Is it the correct code to subset GWAS score file? Thanks. If so, I think there is no problem of my data input.
Check if your data/gs_file/alzeh.gs
has the same format as https://martinjzhang.github.io/scDRS/file_format.html#gs
Hi, the format looks good to me:
<img width="790" alt="image" src="https://github.com/user-attachments/assets/9dba44d6-c3fb-42ab-bf5f-81ba3d1e5a45">
I will try both subsample and whole traits again and get back to you later, thanks.
Hi, I tried the method again, and I still cannot find the full_score file. Only score file exists:
PASS_Alzheimers_Jansen2019.score.gz
Is it caused by:
Preprocessing:
Computing scDRS score:
trait=PASS_Alzheimers_Jansen2019: skipped due to small size (n_gene=7, sys_time=22.5s)
And my sample size is too small? Thanks.
your gene set size is too small (requirement >10 genes).
I don't think scDRS has output anything
PASS_Alzheimers_Jansen2019.score.gz
is a file that already exists.
Thanks, that makes sense.
Hi, thanks for your great work. I downloaded the related files and tried to reproduce the score analysis for Alzehaimer disease with provided score list. However, it seems that after unzipping the score file, the AZ GWAS score file is named as in our output:
PASS_Alzheimers_Jansen2019.score.gz,
which is not a full_score file. Therefore, after running scDRS analysis in this step:
I will receive an error:
The .gz file is generated based on :
Did I miss anything? Thanks.