omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
86 stars 21 forks source link

input format for summary statistics for munging? #37

Closed complexgenome closed 3 years ago

complexgenome commented 3 years ago

hello there,

Thanks for this wonderful software and Wikipedia. I was able to install it seamlessly using conda. I have GWAS summary statistics for an admixed minority population. The summary statistics follow CHR:POS:Allele1:Allele2 format. Data are imputed in-house, therefore SNP names follow this pattern than rsids.

I would like create parquet format for the summary stats file.

May I know what column headers are needed with munge_polyfun_sumstats.py? Also, is there any specific order or columns that is must with the script?

with SNP rsids, chromosome and base pair info, and either a p-value, an effect size estimate and its standard error, a Z-score or a p-value.

Sample data from BOLT-LMM has columns as: SNP, CHR, BP, INFO, BETA, SE

Thanks.

omerwe commented 3 years ago

Hi,

please notice the wiki text: "The script tries to be flexible and accommodate multiple file formats and column names. It generally requires only a sample size parameter (n) and a whitespace-delimited input file with SNP rsids, chromosome and base pair info, and either a p-value, an effect size estimate and its standard error, a Z-score or a p-value." I suggest you try applying the script on your file and see it it works. If not, it will hopefully give you an informative error message. Notably, the file doesn't require a "proper" rsid, so you can use your own custom-format rsids.