omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

a question about Polypred+ #190

Closed Y-Isaac closed 5 months ago

Y-Isaac commented 5 months ago

HI,

I’m sorry, I didn’t fully understand your article. You mentioned in the article that “PolyPred cannot use data from a fixed-effects meta-analysis of GWAS data from different populations”. So how should I prepare the input data when using PolyPred+?

For example, if I now have large sample data of European ancestry and Japanese ancestry, and I am preparing to use PolyPred+ to generate PRS for Japanese ancestry, should I directly prepare the preliminary work within each ancestry (i.e., Genome-wide fine-mapping and Estimating tagging SNP effect), and then input the result files of the two ancestries in the --betas parameter at the same time? In addition, I did not see the unique parameters of PolyPred+ in the WIKI. Does the PolyPred.py script judge by itself based on the input file?

Thank you in advance for your help!

omerwe commented 5 months ago

@Y-Isaac I'm not sure I understand the question... What do you mean by whether the PolyPred.py script judges by itself? What are you referring to?

To answer some questions:

Hope this helps, if not I'm happy to answer more specific questions.

Y-Isaac commented 5 months ago

@omerwe Sorry, I am new in this field so my questions may be a little silly, but I will try my best to express clearly.

Assuming that I currently have individual data from both ancestry in Europe and Japan, and I want to build PRS targeting the Japanese population. So I need to use European data to perform Polyfun+Susie to obtain causal effects for each locus(without the need for Polyfun for the Japanese population), and I also need to use BOLT-LMM to obtain tagging effects for both European and Japanese populations separately . Finally, I use --beta to input these three files.

May I ask if my understanding of the process of Polyfun+is accurate? Thank you very much for your help!

omerwe commented 5 months ago

@Y-Isaac your understanding is correct!

Since you have individual-level data for both of your populations, in principle you could run joint fine-mapping using both of them. This depends on the size of the populations --- as a rule of thumb fine-mapping is only beneficial if you have >100K individuals in a population.

When we wrote the paper we assumed that most people will not have access to individual-level data from two separate populations with N>100K for each. So I wouldn't rush to do this, but it is possible in principle

Y-Isaac commented 5 months ago

Thanks! Now I am going to close this issue.

Y-Isaac commented 5 months ago

@omerwe HI,

Sorry to reopen this issue. I have one more question, which is eager for your confirmation.

In my mind, even within the same ancestry, there are some differences in LD patterns between different cohorts or populations, and these differences can lead to bias in PRSs built based on lead SNPs, so I think that building PRSs with casual SNPs can better avoid this bias (which is also mentioned in your article).

So, what I want to confirm is, can I still use PolyPred even if the training set and target population are the same ancestry, and does this violate some of the assumptions of the software?

Thanks for your help in advance!

omerwe commented 5 months ago

@Y-Isaac you're absolutely right --- PolyPred improves PRS accuracy even within the same ancestry (there are many results demonstrating this in the PolyPred paper).

Y-Isaac commented 5 months ago

@Y-Isaac you're absolutely right --- PolyPred improves PRS accuracy even within the same ancestry (there are many results demonstrating this in the PolyPred paper).

Thanks!