zhanxw / rvtests

Rare variant test software for next generation sequencing data
131 stars 41 forks source link

huge EFFECTS difference between SCORE and WALD test on binary trait #19

Closed jielab closed 7 years ago

jielab commented 7 years ago

Dear Xiaowei:

Below is a message from my colleague, who identified very different EFFECTS between SCORE and WALD tests. "For the --single wald test, it took much longer, i.e. 73439 seconds (20.4 hours), but the effect size, SE, and p-values are all quite consistent with what I got from SNPTEST. Then, I looked at the wiki page of Rvtest http://genome.sph.umich.edu/wiki/Rvtests, it says the "score" option only fits null model and the "wald" option fits alternative model. So I think I will need to use the "wald" option anyways for the future GWAS analyses. So it will take about 20.4 hours for 150 jobs, maybe 4 days for 550 jobs.

Then I compared the two tests using both continuous and binary traits. For continuous trait, the EFFFECTS and PVALUES are exactly the same, but for binary trait with covariates including 10 PCs, they are very different. Please see below. Also, I noticed some of the EFFECTS are huge, >5000. Is the EFFECTS here BETA, does that look reasonable/possible?

Thank you very much & best regards, Jie

picture2

zhanxw commented 7 years ago

Hi Jie,

SCORE vs. WALD

The advantage of SCORE test is its speed, as you have observed in your test. In practice, it is not a bad idea to first run SCORE tests to identify "hits" and then run WALD tests for those "hits". On the contrary, WALD tests is used more versatile.

The EFFECT/PVALUES calculation is exact for continuous traits. That's why you observed the same results; but the calculation is approximate for binary traits. I do expect to observe differences. For EFFECTS estimations, when fitting the logistic regression under the alternative hypothesis, the convergence problem can leads to huge effects estimations. I am not surprised to see >5000 estimations. From the figure of P-values, both method give comparable p-values when p-values are small.

I am copying @dajiangliu here in case he has comments on this.