omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

PolyLoc bfile suggestion? #164

Closed teresa-sansan closed 1 year ago

teresa-sansan commented 1 year ago

Hi Omer,

I tried to run PolyLoc and followed the steps on the wiki page, but I wasn't sure what bfile I should use in stage 1. Based on my understanding (please correct me if I'm wrong), bfile provides extra SNPs that were not found in the posterior file. If I am using the precomputed summary LD that you provided, what kind of plink file (or just bim file) will you suggest me to use to run PolyLoc? (Not sure if I have access to specific UKBB imputation plink files)

And there was a sentence on stage2 wiki that says "You must provide a --bfile-chr flag even if you use precomputed summary LD information. In this case PolyLoc only requires a plink .bim file, so you can create such a file even if you don't have access to individual level genotypes". But I guess it still requires all the plink files (.bim, .bed, .fam) to run Stage 1? (I got a fam not found error when only giving .bim files).

Also, is there a reason why all the variants in the posterior files should also be found in the Plink files? (if we are only using plink files to find the missing SNPs in posterior files)

Thanks, Teresa

omerwe commented 1 year ago

@teresa-sansan you're asking several questions, which exposed some design flaws in the code (which I've now fixed). I'll address each in turn

  1. The bim files should include SNPs that you wish to include in polygenic localization. In the PolyFun paper we used all common SNPs in the UK Biobank, but this is a matter of choice. Of course, this choice will inform how to interpret your results. In the PolyFun paper, our estimates only show the proportion of common SNP heritability explained with increasing numbers of common SNPs.

  2. You're right that the design of the code was too constrained. I now rewrote the code so that it doesn't require a parameter --bfile-chr in step 1, and in step 2 (if you use precomputed summary LD information). After you git pull the latest code, you can drop the --bfile-chr flag.

  3. I also rewrote the code so that step 1 runs even if you only provide bim files (no need for fam files)

  4. I updated the Wiki page accordingly, I hope the updated page is clear, please let me know if not!

Hope it's clear, please let me know if not!

teresa-sansan commented 1 year ago

Hi Omer, Thanks for the quick fix! It's very helpful! I git pull the latest script and it ran perfectly. (with bfile and without) I'll close this issue.

Best, Teresa

teresa-sansan commented 1 year ago

Hi Omer,

Just tested out that the extra not in line 67 of the new polyloc.py should be removed. if not not args.ld_ukb and args.bfile_chr is None:

So it won't throw out the following error in step 2! Traceback (most recent call last): File "polyloc_reclone/polyloc.py", line 351, in <module> args = check_args(args) File "polyloc_reclone/polyloc.py", line 67, in check_args raise ValueError('You must specify either --ld-ukb or --bfile-chr when using --compute-ldscores') ValueError: You must specify either --ld-ukb or --bfile-chr when using --compute-ldscores

Best, Teresa

omerwe commented 1 year ago

Fixed, thanks for the bug report! Hope all works now, please reopen the issue if not