ylaboratory / seismic

Seismic R package. Discover cell type-trait associations in minutes for GWAS and single-cell RNA-sequencing data
BSD 3-Clause "New" or "Revised" License
10 stars 4 forks source link

convert mouse gene identifiers to human ones that match data in GWAS summary data #3

Open Jylab-Genetics opened 2 weeks ago

Jylab-Genetics commented 2 weeks ago

Why do we need to execute 'convert mouse gene identifiers to human ones that match data in GWAS summary data'? I don't quite understand. Are GWAS sources different from single-cell data sources?

VincentQLai commented 2 weeks ago

Yes, they are sometimes different.

For a genome-wide association study (GWAS), hundreds of thousands of whole-genome sequencing data are collected and statistical tests are performed for each SNP, which is primarily encoded as information from the human genome. While for single-cell RNA-seq, it can be performed for any organisms. So there is a need for converting the gene ID mapping so that the data can match.

In terms of the specific issue of seismic, it requires the input of MAGMA gene-level Z-score file as an input, where the SNP-level statistics are aggregated to gene-level, which is encoded as Human Entrez ID. As a result, unless the gene names are already encoded as Human Entrez ID in the scRNA-seq data, there is a need for gene ID conversion. Currently seismic's innate data structure can only handle conversions between several listed gene ID types. We plan to implement more flexible conversion options in future updates.

Hope this information helps address your question.

Screenshot 2024-11-04 at 12 56 03 PM
Jylab-Genetics commented 2 weeks ago

Thank you for your prompt response. This software demonstrates exceptional flexibility in integrating SNP and single-cell RNA-seq data, especially in gene ID conversion and data matching, offering unprecedented convenience for association analyses at the gene level and opening new avenues for research.

Regarding cross-species compatibility, I would like to confirm: does the software currently mainly support association data between humans and mice? Is it possible to extend this support to integrate human SNP information with single-cell data from non-human primates or other mammals? Such an expansion would be highly valuable for cross-species genetic association studies, helping to uncover the molecular mechanisms of specific traits across different species. I look forward to your further clarification.