kauwelab / PolyRiskScore

PRSKB is a website and command-line interface tool for calculating polygenic risk scores using GWA studies from the NHGRI-EBI Catalog.
23 stars 1 forks source link

P-Value Threshold for Associations (-c) #425

Closed erskck1 closed 1 year ago

erskck1 commented 1 year ago

Hi,

If I understand it correctly, PRSKB uses in PRS calculation only the SNPs which are significantly associated (p_value < 5x10-8) with the outcome in a GWA study. That means, setting -c parameter to a higher value than 5x10-8 doesn't make sense at all because this parameter defines the threshold for only associated SNPs. All other SNPs aren't use in the calculation of PRS at all.

I took the following text from your publication :

The data are filtered to include only associations that contain both a beta value (or odds ratio) and the respective risk allele.

In the study you used in the publication to compare PRSKB with PRSIce-2 (GCST002245 Lambert et al.), there were 33 associated SNPs, so only 33 SNPs or SNPs with them in LD will be used in the calculation of PRS.

Is my understanding correct?

Thanks and best regards, Ersoy

mpage21 commented 1 year ago

Hi @erskck1,

Thank you for looking into our tool!

To address your first paragraph, we personally don't require the SNPs to be significantly associated with the outcome of a GWA study. We grab the SNPs from the GWAS Catalog. If they have that threshold for SNPs, then we indirectly use it, but I don't believe they have that as a requirement.

As for the quotation from our paper, we required the SNPs that we have in our database to have some sort of association value (beta value or odds ratio) and also a designated SNP for that association. Some SNPs in the GWAS database have a '?' in the spot where the risk allele goes. Since we wouldn't know which allele that association was for, we opted to filter those variants out. Additionally, some SNPs don't have an odds ratio or a beta value reported. We filter those SNPs out, since we wouldn't know what value to use in the PRS calculation.

As far as that specific example, I will let @MattCloward give the response.

Hope this helps, Maddy

MattCloward commented 1 year ago

@erskck1 I apologize for my delayed reply.

Some p-values in the GWAS catalog are higher than 5x10-8, but typically not higher than 1x10-5. Setting a -c parameter higher than that would include all associations and not filter any of them. We don't restrict the use of the -c parameter to any particular p-value threshold.

For your second question, it appears that we may have only used 32 SNPs, filtering out the "chr17:61538148" entry. I will investigate this further and get back to you.

MattCloward commented 1 year ago

I have confirmed that for our comparison, both PRSKB and PRSice2 only used 32 SNPs in this calculation and filtered out "chr17:61538148."

erskck1 commented 1 year ago

Hi Matthew, Thank you very much! It is clear to me now, you can close the issue. Best, Ersoy