perslab / CELLECT

CELLECT (CELL-type Expression-specific integration for Complex Traits)
GNU General Public License v3.0
71 stars 19 forks source link

No SNP rsID column in GWAS dataset #46

Closed AnjaMJ closed 4 years ago

AnjaMJ commented 4 years ago

Hi there, I have a question regarding the input for the mtag_munge.py.

Is it possible to run mtag_munge.py and then the rest of the CELLECT program with only chromosome and position and not rsIDs from the GWAS?

Example of GWAS:

CHROM POS REF ALT N POOLED_ALT_AF EFFECT_SIZE EFFECT_SIZE_SD H2 PVALUE CHISQ N_STUDIES 1 101776|A_AC A AC 47245 0.301005 0.0129254 0.010835 3.01214e-05 0.232896 1.42308274458787 46 1 102357|T_TA T TA 16601 0.00045178 0.0938598 0.248517 8.59239e-06 0.705668 0.142641876751047 4 1 103582|T_TA T TA 41589 0.384657 -0.00333119 0.0112643 2.10286e-06 0.767437 0.0874561573417535 43 1 105191|A_G G A 6775 0.000516605 -0.0872491 0.376068 7.94476e-06 0.816535 0.0538256335893012 2

bengnielsen commented 4 years ago

I think the mtag_munge.py might interpret it as there not being SNPs in your file. Although I think the real problem here might be that ldsc.py uses rsID to estimate the sigma and without that you might run into this problem: https://github.com/bulik/ldsc/issues/166

pascaltimshel commented 4 years ago

Hi @AnjaMJ ,

You need the rsIDs - not chromosome/position. Because of your good question, we have updated the wiki: https://github.com/perslab/CELLECT/wiki/Input-&-Output#gwas_sumstats-munged-gwas-summary-statistics