Closed complexgenome closed 3 years ago
Hi,
please notice the wiki text: "The script tries to be flexible and accommodate multiple file formats and column names. It generally requires only a sample size parameter (n) and a whitespace-delimited input file with SNP rsids, chromosome and base pair info, and either a p-value, an effect size estimate and its standard error, a Z-score or a p-value." I suggest you try applying the script on your file and see it it works. If not, it will hopefully give you an informative error message. Notably, the file doesn't require a "proper" rsid, so you can use your own custom-format rsids.
hello there,
Thanks for this wonderful software and Wikipedia. I was able to install it seamlessly using conda. I have GWAS summary statistics for an admixed minority population. The summary statistics follow CHR:POS:Allele1:Allele2 format. Data are imputed in-house, therefore SNP names follow this pattern than rsids.
I would like create parquet format for the summary stats file.
May I know what column headers are needed with
munge_polyfun_sumstats.py
? Also, is there any specific order or columns that is must with the script?with SNP rsids, chromosome and base pair info, and either a p-value, an effect size estimate and its standard error, a Z-score or a p-value.
Sample data from BOLT-LMM has columns as:
SNP, CHR, BP, INFO, BETA, SE
Thanks.