More stringent GWAS p-value threshold

addramir commented 2 months ago

It is time to discuss the genome-wide significant p-value threshold for GWAS and molQTL studies. We started this discussion with @Daniel-Considine and @d0choa.

Background

For now, we use the standard 5e-8 threshold for clumping and FM, but on the harmonic averaging stage we only use 1e-8. We are growing, and we also have non-Europeans. 5e-8 doesn't work anymore. It is difficult to estimate the effective number of independent tests. As a rule of thumb, however, we can simply assume that the new threshold should be at least one order of magnitude stricter than the previous one. If it was 1e-8, the new one should be 1e-9. Using the new threshold will reduce the computational efforts. @DSuveges what do you think?

For molQTLs I suggest to use the default study-wise p-value threshold. For example, UKBB-PPP uses 1.7e-11.

Tasks

[ ] We need to update p-value threshold for clumping for GWAS on 1e-9.
[ ] We need to filter GWAS catalog curated CSs by lead variant p-value using 1e-9.
[ ] We need to filter SuSiE finngen CSs by lead variant p-value using 1e-9.

Acceptance tests

Nice to check, how much less CSs we will have.

d0choa commented 2 months ago

@DSuveges loves multiple testing problems...

addramir commented 2 months ago

@Daniel-Considine what is the threshold for eQTL catalogue?

DSuveges commented 2 months ago

I have no problem with lowering the p-value threshold, however setting one single value across the entire data lake souds quite drastic and hard to imagine the consequences. Espcecially given the curated datasets from GWAS Catalog. I would carefully benchmark of the effect. I can imagine, there would be entire diseases where we would lose any knowledge we had.

Let's assume there's a disease with only one GWAS Study that could identify 3 significant loci given the 5e-8 threshold. This knowledge comes with the standard pinch of salt being 5% chance these signals are false. (If I correctly intrepret the stats here)

After the p-value thresold adjustment, we can claim that the overall reliability of our full dataset improved, but we would no longer be able to tell anything about this disease. Not even with the 5% FDR. Does it worth?

I tend to symphatise with this action for traits/diseases where there are a bunch of studyies with hugely varying sample sizes, but such a systemic cut might drop a lot of rare stuff.

addramir commented 2 months ago

Let's assume there's a disease with only one GWAS Study that could identify 3 significant loci given the 5e-8 threshold. This knowledge comes with the standard pinch of salt being 5% chance these signals are false. (If I correctly intrepret the stats here)

The math is true if you have only one study. If you have more studies with the same p-value, your FDR will be much higher.

The proposed threshold doesn't really solve the problem of multiple testing - it is too liberal, however it probably can fix some problems from having high density genotyping panels and non-european ancestries. I agree that we need somehow to benchmark it and I will create a separate ticket for it.

opentargets / issues