omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

PIP=0 for all variants #58

Closed nickhir closed 3 years ago

nickhir commented 3 years ago

I am using the precomputed prior causal probabilities from the UK Biobank to annotate my GWAS SNPs that originate from a purely European population. All SNPs overlap and the extract_snpvar.py runs without a problem. The SNPVAR values range from 4.0733e-09 to 4.0733e-07.

Afterwards I am using Susie to perform the actual fine mapping step using the following command.

python finemapper.py --geno full_dataset_2 --sumstats chr12_variants_with_prior.txt.gz --n 92 --chr12 --start 18999966 --end 29047425 --method susie --max-num-causal 3 --cache-dir . --out finemap_result

The full_dataset_2 is the dataset in PLINK format that I used to perform the initial GWAS.

The finemap_result shows that both the PIP and the CREDIBLE_SET column contain only 0.

Do you have some insights why this could be the case?

The significant SNPs for which I want to find the causal SNP are these:

#CHROM      POS         ID REF ALT A1 TEST OBS_CT     BETA       SE  T_STAT
1:     12 23999769 rs78624193   T   G  G  ADD     92 0.982637 0.198682 4.94578
2:     12 24008435 rs11047132   T   G  G  ADD     92 1.095340 0.212070 5.16500
3:     12 24016182 rs10505909   T   C  C  ADD     92 1.095390 0.212063 5.16540
4:     12 24023714 rs11047141   T   C  C  ADD     92 1.095140 0.212019 5.16530
5:     12 24047496 rs12425715   G   C  C  ADD     92 1.266160 0.252284 5.01879
             P
1: 4.18463e-06
2: 1.75800e-06
3: 1.75519e-06
4: 1.75589e-06
5: 3.14072e-06

To identify the causal SNPs I included all SNPs that are located within a 5MB window upstream and downstream of these variants , so I ended up with 21413 variants in total. Could the window size be a problem?

Or could the problem be, that the p values I got in my initial GWAS analysis are comparatively big? I only have 92 individuals for my study, so I thought that it is unlikely to see extreme p values such as 10^-20, so I just included the most significant SNPs.

Any help is much appreciated!

omerwe commented 3 years ago

Hi,

I'm not sure why this happens, but N=92 is a tiny number compared to most GWAS. The fact that you got some SNPs with small p-values (p~1e-6) with such a small sample size, suggests that you have a SNP with a huge effect size. Also, using in-sample LD with N=92 and with so many SNPs probably leads to severe noise in the LD estimates.

If this is a European sample, I suggest to try using the UKB LD data that we published online, instead of in-sample LD estimates (see the wiki for details) . I'm not sure this will solve all problems, but it would be a step in the right direction.

The only other alternative I can think of is to significantly decrease the window size, so that you include only ~20 SNPs at least, around the most promising position. It's not ideal, and it's a form of searching under the lamplight, but I think it should help address some of the technical limitations in having a very small sample size.

Best,

Omer