rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
187 stars 55 forks source link

Questions about p value of interaction testing #332

Closed WZo0o closed 2 years ago

WZo0o commented 2 years ago

Hello, After I run the GxE analysis, I got total of three p-values including ADD-INT_SNPxTD=1, ADD-INT_SNPxTD=2 and ADD-INT_SNPxTD. Then I visualized these three kind of p-values and noticed that the p-values of ADD-INT_SNPxTD=1 and ADD-INT_SNPxTD=2 are much larger than p-values of ADD-INT_SNPxTD. Are there any mistakes in my analysis? Sincerely, WangZheng

The Manhattan plot of p-values of ADD-INT_SNPxTD. Rectangular-Manhattan LOG10P_new_TD The Manhattan plot of p-values of ADD-INT_SNPxTD=1. Rectangular-Manhattan LOG10P_new_T1D The Manhattan plot of p-values of ADD-INT_SNPxTD=2. Rectangular-Manhattan LOG10P_new_T2D

This is my script regenie --step 2 --interaction TD[0] --covarColList TD,PS_SCORE --catCovarList TD --pgen UKB_auto_merge --keep qc_pass.id --extract qc_pass.snplist --ref-first --phenoFile phe.txt --covarFile cov.txt --bt --firth --approx --pThresh 0.01 --pred ukb_step1_BT_pred.list --bsize 400 --threads 8 --chr 21 --out ukb_step2_BT_chr21

This is my outcome CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ INFO N TEST BETA SE CHISQ LOG10P EXTRA 21 9413985 21:9413985_A_T A T 0.00540667 0.775698 323697 ADD-CONDTL -0.434759 0.23297 3.48255 1.20748 NA 21 9413985 21:9413985_A_T A T 0.00540667 0.775698 323697 ADD-INT_SNP -0.494672 0.310922 2.53124 0.952286 NA 21 9413985 21:9413985_A_T A T 0.00540667 0.775698 323697 ADD-INT_SNPxTD=2 -0.501612 0.860496 0.339812 0.25186 NA 21 9413985 21:9413985_A_T A T 0.00540667 0.775698 323697 ADD-INT_SNPxTD=1 -12.9373 52.3614 0.0610474 0.0942861 NA 21 9413985 21:9413985_A_T A T 0.00540667 0.775698 323697 ADD-INT_SNPxTD NA NA 0.400329 0.0869302 NA 21 9413985 21:9413985_A_T A T 0.00540667 0.775698 323697 ADD-INT_3DF NA NA 4.13859 0.607509 NA

joellembatchou commented 2 years ago

Hi,

The Manhattan plot for the interaction tests (SNPxTD=1 and SNPxTD=2) look off (seems there is a threshold at P=0.01). Could you re-run using --pThresh 0.05 for the Firth test and include the resulting Manhattan plots?

WZo0o commented 2 years ago

Hi,

Thanks for your reply. I will take your advice, but could you tell me why set the parameter --pThresh 0.05 ?

joellembatchou commented 2 years ago

Hi,

In the first run, --pThresh is set to 0.01 which corresponds to where the threshold is in the ManP so I am wondering if the lack of p-values above that level indicates an issue with fitting penalized Firth for the models with interaction.

WZo0o commented 2 years ago

Hi, After the value of --pThresh is set to 0.05, which still corresponds to the max value in the ManP, Are there some issues with fitting penalized Firth for the models with interaction?

The Manhattan plot of p-values of ADD-INT_SNPxTD=1.

Rectangular-Manhattan LOG10P_new_T1D

The Manhattan plot of p-values of ADD-INT_SNPxTD=2.

Rectangular-Manhattan LOG10P_new_T2D

joellembatchou commented 2 years ago

So at the end of the REGENIE log, it should indicate how many Firth tests failed. Is it all of them?

WZo0o commented 2 years ago

Not all of Firth tests have failed only about one-third of the failures.


Association results stored separately for each trait in files :

Number of tests with Firth correction : 340446 Number of failed tests : (132056/340446) Number of ignored tests due to low MAC : 0


Association results stored separately for each trait in files :

Number of tests with Firth correction : 304347 Number of failed tests : (118968/304347) Number of ignored tests due to low MAC : 0

joellembatchou commented 2 years ago

Is it possible for you to send a snippet of your data for one variant for which Firth failed so I can do further investigation?

WZo0o commented 2 years ago

Hi, I collected some variants from chromosome 1, which have a p-value equal to NA for you to do further investigation. It is inconvenient to send data to you through this web, could you please provide an email address for me to send these data to you ?

joellembatchou commented 2 years ago

Hi,

Can you send it to joelle.mbatchou@regeneron.com? Thanks!

WZo0o commented 2 years ago

Hi, I have sent it to joelle.mbatchou@regeneron.com](mailto:joelle.mbatchou@regeneron.com) through my email ( wangzhengqaq@foxmail.com).

joellembatchou commented 2 years ago

Hi,

To reduce the computational burden with Firth LRT when using categorical interaction variables, only the effect sizes for each interaction parameter is reported. The interaction p-value is only computed for the joint test of the interaction terms (ie beta forTD=1 and TD=2 both equal to 0) as LRT requires fitting the null model for each hypothesis. When Firth is not used (ie none of the interaction terms have p-values below threshold), the score test p-value for each term is reported (which is why you have the cutoff at -log10(pThresh) in the ManP for the interaction terms).

Sorry about the confusion, I will make it clearer in the documentation.

Cheers, Joelle