Closed pvtodorov closed 2 years ago
Thanks for your interest!
I attach a guide to compute MAGMA gene sets from GWAS summary statistics:
# Step 1: download MAGMA software, gene location file, and reference data from
# https://ctg.cncr.nl/software/magma after this step, one should have a folder <MAGMA_DIR>
# with the following files:
# 1) <MAGMA_DIR>/magma 2) <MAGMA_DIR>/g1000_eur.(bed|bim|fam) 3) <MAGMA_DIR>/NCBI37.3.gene.loc
magma_dir=<MAGMA_DIR>
# Step 2: make gene annotation file for MAGMA using the following command, this only needs to be done
# once for different GWAS summary statistics, and the file will be saved to out/step1.genes.annot
mkdir -p out/step1
${magma_dir}/magma \
--annotate window=10,10 \
--snp-loc ${magma_dir}/g1000_eur.bim \
--gene-loc ${magma_dir}/NCBI37.3.gene.loc \
--out out/step1
# Step 3: run MAGMA using the following command, this takes a GWAS file ${trait}.pval,
# which at least has the columns: SNP, P, N, which corresponds to the SNP id
# (matched to the ${magma_dir}/g1000_eur.bim), p-value, sample size. For example,
# <trait>.pval file looks like
#
# CHR BP SNP P N
# 1 717587 rs144155419 0.453345 279949
# 1 740284 rs61770167 0.921906 282079
# 1 769223 rs60320384 0.059349 281744
#
# After this step, one should obtain a file out/step2/${trait}.gene.out, and the top genes with
# largest Z-scores can be input to scDRS.
mkdir -p out/step2
${magma_dir}/magma \
--bfile ${magma_dir}/g1000_eur \
--pval ${trait}.pval use='SNP,P' ncol='N' \
--gene-annot out/step1.genes.annot \
--out out/step2/${trait}
We will update our document correspondingly.
Wow! Awesome & thank you for quick response 🚀
Hi Kangcheng. Do you have any documentation on your .sumstats files? Specifically, are the CHISQ and Z columns from the GWAS, or an output of a step within scDRS?
Here is the first few lines of PASS_IBD_deLange2017.sumstats, for example:
SNP A1 A2 N CHISQ Z rs7899632 A G 59957 3.4299 -1.852 rs3750595 A C 59957 3.3124 1.82 rs10786405 T C 59957 3.4559 1.859 rs3793692 A G 59957 0.119 -0.345 rs737657 A G 59957 0.0912 -0.302 rs17524355 T C 59957 4.02 -2.005 rs7086391 T C 59957 3.7249 -1.93
Hi @jaredslosberg
Those are from GWAS and follow the LDSC format https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format
Let us know if you have further questions/suggestions.
Best, Martin
Hi @martinjzhang! Thank you for this awesome package.
Do you have a recommended method for converting the Entrez IDs produced by MAGMA to gene symbols? It seems that the compute-score tool does not recognize the Entrez IDs (First 3 elements for 'TRAIT': [], []
) and returns a segmentation fault.
This is what my geneset looks like after following the tutorial above:
@twoneu MAGMA's NCBI37.3.gene.loc (e.g., see https://ctg.cncr.nl/software/MAGMA/aux_files/NCBI37.3.zip) has 1st column as entrez IDs and last column as gene symbol.
Also adding that the gene symbols in the .gs file should be consistent with var_names in AnnData.
Hi Kangcheng. Do you have any documentation on your .sumstats files? Specifically, are the CHISQ and Z columns from the GWAS, or an output of a step within scDRS?
Here is the first few lines of PASS_IBD_deLange2017.sumstats, for example:
SNP A1 A2 N CHISQ Z rs7899632 A G 59957 3.4299 -1.852 rs3750595 A C 59957 3.3124 1.82 rs10786405 T C 59957 3.4559 1.859 rs3793692 A G 59957 0.119 -0.345 rs737657 A G 59957 0.0912 -0.302 rs17524355 T C 59957 4.02 -2.005 rs7086391 T C 59957 3.7249 -1.93
Sorry, another question - how would you recommend that we generate genesets using MAGMA for summary statistics like these with no P-value column?
Thank you so much for your time!
@twoneu are there other columns containing information needed to derive a p-value? We can help if you provide us information about columns you have. Also see MungeSumstats which help standardize sumstats.
The only columns are SNP, A1, A2, N, Z, and CHISQ (see PASS_IBD_deLange2017.sumstats as an example). Mungesumstats seems to require a P value to run as well.
@twoneu You can use the following function to convert z-score to p-value.
import scipy.stats
import numpy as np
# zscore can be a numpy array
pval = 2 * (1 - scipy.stats.norm.cdf(np.abs(zscore)))
You need to manually add a column to the sumstats file.
UPDATE NOTE: My previous reply pval = scdrs.util.zsc2pval(zsc)
is for computing one-sided p-value. But in GWAS we typically do two-sided p-value (as in the updated reply).
@twoneu please see the updated reply above (i made a mistake earlier: we should compute two-sided p-value instead one-sided p-value to convert GWAS z-score to p-value)
Thanks for your interest!
I attach a guide to compute MAGMA gene sets from GWAS summary statistics:
# Step 1: download MAGMA software, gene location file, and reference data from # https://ctg.cncr.nl/software/magma after this step, one should have a folder <MAGMA_DIR> # with the following files: # 1) <MAGMA_DIR>/magma 2) <MAGMA_DIR>/g1000_eur.(bed|bim|fam) 3) <MAGMA_DIR>/NCBI37.3.gene.loc magma_dir=<MAGMA_DIR> # Step 2: make gene annotation file for MAGMA using the following command, this only needs to be done # once for different GWAS summary statistics, and the file will be saved to out/step1.genes.annot mkdir -p out/step1 ${magma_dir}/magma \ --annotate window=10,10 \ --snp-loc ${magma_dir}/g1000_eur.bim \ --gene-loc ${magma_dir}/NCBI37.3.gene.loc \ --out out/step1 # Step 3: run MAGMA using the following command, this takes a GWAS file ${trait}.pval, # which at least has the columns: SNP, P, N, which corresponds to the SNP id # (matched to the ${magma_dir}/g1000_eur.bim), p-value, sample size. For example, # <trait>.pval file looks like # # CHR BP SNP P N # 1 717587 rs144155419 0.453345 279949 # 1 740284 rs61770167 0.921906 282079 # 1 769223 rs60320384 0.059349 281744 # # After this step, one should obtain a file out/step2/${trait}.gene.out, and the top genes with # largest Z-scores can be input to scDRS. mkdir -p out/step2 ${magma_dir}/magma \ --bfile ${magma_dir}/g1000_eur \ --pval ${trait}.pval use='SNP,P' ncol='N' \ --gene-annot out/step1.genes.annot \ --out out/step2/${trait}
We will update our document correspondingly.
Thank you so much for this post! I got a question and appreciate it if you could help me with this; I'm wondering in the second script as I have three files (bim, fam, bed) separately how do I put it here --bfile ${magma_dir}/g1000_eur. I tried to combined them in one file using cat but I keep getting this error: ERROR - parsing arguments: file '/home/paria/scratch/als-project/analysis/ocu_med_sc/rpca/scDRS/MAGMA/g1000_eur.bed' specified in --bfile does not exist or is not a file. I would like to know how to make the g1000_eur including all three .bed, .bim, .fam. Many thanks, Paria
Hi,
You should keep the three files as is: g1000_eur.bed / g1000_eur.bim / g1000_eur.fam
--bfile
corresponds to a prefix to the three files
Hi,
You should keep the three files as is: g1000_eur.bed / g1000_eur.bim / g1000_eur.fam
--bfile
corresponds to a prefix to the three files
Perfect, thanks for the swift response! I have one more questions if you don't mind. First, based on step 2, the --snp-loc is ${magma_dir}/g1000_eur.bim \ but in MAGMA tutorial (https://ctg.cncr.nl/software/MAGMA/doc/manual_v1.10.pdf) it says the --snp-loc could be provided by user. I was wondering if I have a gwas should the --snp-loc in the step2 --bfile in step 3 be the gwas? Thanks, Paria
Hi,
--snp-loc should be --snp-loc ${magma_dir}/g1000_eur.bim
GWAS should be --pval in step 3
Perfect, thank you!
Hi!
I'm trying to figure out how to generate a custom geneset file using MAGMA that could plug into scDRS. I found the following text within the manuscript
Could you provide an example of the MAGMA inputs and commands that performs this?