omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

Missing SNPs in merging summary statistics with reference panel #181

Closed Y-Isaac closed 6 months ago

Y-Isaac commented 6 months ago

I'm sorry, but I've encountered a new problem again.

While I was following the "PolyFun approach 3: Computing prior causal probabilities non-parametrically", specifically on step 2 "Run PolyFun with L2-regularized S-LDSC", I noticed that some SNPs were removed during the "merging summary statistics with reference panel" process. Here is the content from the log file: [INFO] Reading summary statistics from /public/home/P202306/polyfun_test/test/ALT_munged.parquet ... [INFO] Read summary statistics for 13087753 SNPs. [INFO] Reading reference panel LD Score from /public/home/P202306/polyfun_test/ldscore_new/chr[1-22] ... [INFO] Read reference panel LD Scores for 13156088 SNPs. [INFO] Reading regression weight LD Score from /public/home/P202306/databed/bed_43232_ukbsnp/ldscore/[1-22] [INFO] Read regression weight LD Scores for 13284474 SNPs. [INFO] After merging with reference panel LD, 12448072 SNPs remain. [INFO] After merging with regression SNP LD, 12448072 SNPs remain. [INFO] Removed 178 SNPs with chi^2 > 431.334 (12447894 SNPs remain)

Upon inspection, I preliminarily identified the reason. Since my summary file was obtained through GWAS-meta analysis using metal, some SNPs have inconsistencies between A1 A2 in the gene file (plink bfile), where the ref-allele was treated as A1. After I modified the a1a2 in the summary file to be consistent with the LD panel, this issue was resolved.

I want to confirm if this allele check is as expected? Because I seem not to fully understand why these SNPs were deleted. If possible, could you explain this to me?

Thank you for your help!

omerwe commented 6 months ago

@Y-Isaac as mentioned in #180, please check if these are indels, in which case the order of the alleles makes a difference. In all other cases PolyFun should handle allele-flipping automatically, but not in this case.