martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
109 stars 14 forks source link

ValueError: too many values to unpack (expected 2) #88

Closed kayihui closed 4 months ago

kayihui commented 4 months ago

Hi scDRS team,

I got the following error message. Please let me know how to resolve this.

Thank you. Ka Yi

scdrs compute-score \
    --h5ad-file NAc.combinded.h5ad \
    --h5ad-species mouse \
    --gs-file processed_geneset.gs \
    --gs-species mouse \
    --cov-file None \
    --flag-filter-data False \
    --flag-raw-count False \
    --flag-return-ctrl-raw-score False \
    --flag-return-ctrl-norm-score False \
    --out-folder data/
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.2
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file NAc.combinded.h5ad \
--h5ad-species mouse \
--cov-file None \
--gs-file processed_geneset.gs \
--gs-species mouse \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 1000 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder data/

Loading data:
--h5ad-file loaded: n_cell=3579, n_gene=55416 (sys_time=1.9s)
First 3 cells: ['N1_AAACCCACATCTGTTT', 'N1_AAAGAACAGCACCGAA', 'N1_AAAGAACCAAGGTACG']
First 5 genes: ['Gm37671', 'Gm19087', 'Gm8941', 'Gm38212', 'Gm7449']
Traceback (most recent call last):
  File "/opt/anaconda3/envs/scDRS_py3_8/bin/scdrs", line 740, in <module>
    fire.Fire()
  File "/opt/anaconda3/envs/scDRS_py3_8/lib/python3.8/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/anaconda3/envs/scDRS_py3_8/lib/python3.8/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/anaconda3/envs/scDRS_py3_8/lib/python3.8/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/anaconda3/envs/scDRS_py3_8/bin/scdrs", line 186, in compute_score
    dict_gs = scdrs.util.load_gs(
  File "/Users/kayi/scDRS/scdrs/util.py", line 239, in load_gs
    for i, (trait, gs) in df_gs.iterrows():
ValueError: too many values to unpack (expected 2)
martinjzhang commented 4 months ago

Hi,

It seems your .gs file doesn't have the correct format. Can you double check it to make sure it follows the same format as the example .gs file? https://martinjzhang.github.io/scDRS/file_format.html

kayihui commented 4 months ago

Hi Martin,

Thank you for your quick reply. The .gs file is a subset of the .gs file from the paper. I visually inspected the file, and didn't seem to see a difference.

I have rerun using the original magma_10kb_top1000_zscore.74_traits.rv1.gs file. It's working without error.

I attached the code for subsetting the .gs file.

# Define the subset and the rename dictionary
subset_values = ["PASS_MDD_Howard2019", "PASS_BIP_Mullins2021", "PASS_SleepDuration_Dashti2019", "UKB_460K.other_MORNINGPERSON"]
rename_dict = {
    "PASS_MDD_Howard2019": "MDD",
    "PASS_BIP_Mullins2021": "BP",
    "PASS_SleepDuration_Dashti2019": "SleepDuration",
    "UKB_460K.other_MORNINGPERSON": "MorningPerson"
}

# Subset the DataFrame
subset_df = df_gs[df_gs['TRAIT'].isin(subset_values)]

# Rename the values in the "Trait" column
subset_df['TRAIT'] = subset_df['TRAIT'].replace(rename_dict)

subset_df.to_csv("processed_geneset.gs", sep="\t")
Screenshot 2024-05-27 at 11 30 14

Thank you very much, Ka Yi

martinjzhang commented 4 months ago

Can you remove the first indexing column from the gs file and try running the code again?

kayihui commented 4 months ago

It worked without the error. However, here's the result:

Computing scDRS score:
trait=BP: skipped due to small size (n_gene=0, sys_time=2.8s)
trait=MDD: skipped due to small size (n_gene=3, sys_time=2.8s)
trait=SleepDuration: skipped due to small size (n_gene=0, sys_time=2.8s)
trait=MorningPerson: skipped due to small size (n_gene=1, sys_time=2.8s)

Why it looks like the number of gene in the list is not correct?

martinjzhang commented 4 months ago

Yes, it seems scDRS didn't recognize the genes. Did you specify the genes to be human genes?

kayihui commented 4 months ago

my dataset is mouse, do I have to convert the gene name in the list?

martinjzhang commented 4 months ago

But I see all your genes are upper-case, suggesting the disease genes are human genes. Use this argument to specify the genes to be human genes --gs-species human scDRS will do the auto-conversion.

kayihui commented 4 months ago

Great! Everything is working fine now. Thank you!