martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
106 stars 13 forks source link

Questions about example dataset usage #98

Closed HelloWorldLTY closed 4 weeks ago

HelloWorldLTY commented 4 weeks ago

Hi, I have a simple question about the example of loading the gene score information prior application:

# load adata
adata = sc.read_h5ad("data/expr.h5ad")

# subset gene sets
df_gs = pd.read_csv("data/geneset.gs", sep="\t", index_col=0)

df_gs = df_gs.loc[
    [
        "PASS_Schizophrenia_Pardinas2018",
        "spatial_dorsal",
        "UKB_460K.body_HEIGHTz",
    ],
    :,
].rename(
    {
        "PASS_Schizophrenia_Pardinas2018": "SCZ",
        "spatial_dorsal": "Dorsal",
        "UKB_460K.body_HEIGHTz": "Height",
    }
)
display(df_gs)

df_gs.to_csv("data/processed_geneset.gs", sep="\t")

It seems that the data are from mouse, but the UKB dataset, from my understanding, only include GWAS from human cohert. Is it suitable to use human GWAS information for analyzing mouse dataset? Thanks.

martinjzhang commented 4 weeks ago

Hi, We discussed this point in supplementary note:

Second, we primarily used mouse RNA-seq data (TMS FACS) to study human diseases and complex traits, but there are biological differences between human and mouse. Arguments in favor of using mouse RNA-seq data to study human diseases and complex traits include (1) it is easier to obtain high-quality atlas-level scRNA-seq data from mice, (2) our key findings were replicated in human data, (3) we evaluated only protein-coding genes with 1:1 orthologs between mice and humans, which are highly conserved, (4) we used a large number of genes to associate cells to diseases (1,000 MAGMA putative disease genes), minimizing potential bias due to individual genes differentially expressed across species (see Bryois et al.30 and other studies26,28,29,38 for additional discussions). However, it is possible that some cell types are less conserved across species30,76 (e.g., our results for CA1 pyramidal neurons along the long and radial axes (Extended Data Figure 8) seem to indicate different disease association patterns between human and mouse), motivating follow-up analyses involving human scRNA-seq data (including those that we have performed here).

HelloWorldLTY commented 4 weeks ago

Thanks a lot!