Closed Ojami closed 2 years ago
Hi Oveis,
Yes, if there are variants with MAC<=10, these would be aggregated together in REGENIE prior to running the SKAT tests which will cause discrepancy with R SKAT (unless you also do it in R). Regenie uses Davies' exact method by default to obatin p-values and if that fails, it uses Kuonen's SPA, and only if that also fails does it uses the modified Liu's method (which is what is used as fallback in R SKAT). We decided to prioritize Kuonen's method over Liu's since previous studies have found that Liu can give underestimate the very small p-values (e.g. Baolin Wu et al. Ann Hum Genet 2016).
For your BT run, you are running Regenie with SPA which may cause differences in the results since SKAT R would not use SPA correction. FYI we have released a new Regenie version (v3.1) which has a couple bug fixes for Firth/SPA in SKAT tests.
Note: the Burden results in REGENIE (i.e. ADD) correspond to the burden mask (by default it collapse variants by taking maximum number of ALT alleles across sites). So not the same as Burden test in R SKAT which uses the Beta(1,25) weights when aggregating variants.
Cheers, Joelle
Hi Joelle,
Thanks for the response!
For your BT run, you are running Regenie with SPA which may cause differences in the results since SKAT R would not use SPA correction. FYI we have released a new Regenie version (v3.1) which has a couple bug fixes for Firth/SPA in SKAT tests.
In my example above all p-values for the binary trait are > 0.05 so SPA doesn't affect those p-values (regardless, SKAT function SKATBinary_Robust
from Zhao et al. AJHG 2020 also applies ER and SPA).
Also,
Note: the Burden results in REGENIE (i.e. ADD) correspond to the burden mask (by default it collapse variants by taking maximum number of ALT alleles across sites). So not the same as Burden test in R SKAT which uses the Beta(1,25) weights when aggregating variants.
Yes, sorry I forgot to mention that. I just reportred burden p-value for the sake of completeness, otherwise they're different approaches.
Overall, it looks that the only difference is because of REGENIE's behavior in aggregating ultra-rare variants. Do you think in near future REGENIE can output also the method used to calculate p-value (with --debug
flag)? SKAT does that (either Davies or Liu in case Davies fails), and may help in diagnostics.--debug
already does this, but for some reasons, --debug
doesn't work with bash; e.g. something like this:
#!/bin/bash
regenie chr1 ... &
regenie chr2 ... &
wait
[PS] There is also some difference for SKAT-O p-values between v.3.0.1
and v.3.1
(latest) for the quantitative trait (above):
version//P | v.3.1 | v.3.0.1 |
---|---|---|
ADD | 1.22E-57 | 1.22E-57 |
SKAT beta(1,25) | 2.20E-40 | 2.20E-40 |
SKAT-O | 1.41E-57 | 1.24E-56 |
What could possibly cause this difference (becaue SKAT is intact)?
Best/ Oveis
Hi Oveis,
In v3.0.1, in the SKATO numerical integration step computing tail probabilities from a mixture of chisq1 distribution, Davies' method was used as default and if it failed, Kuonen's SPA was used (and Liu was then used if SPA failed). In v3.1, Kuonen's SPA is used whenever Davies' p-value is below 10^{-5} and if SPA fails, we retry Davies with more stringent parameters (following Baolin Wu et al. Ann Hum Genet 2016). Liu is only applied if all of these fail. Furthermore, following the same paper, we have used their formula in Eq(3) to compute the SKATO integral (instead of the original one from SKATO 2012 paper) which can give better accuracy for smaller p-values.
For the --debug
option, it prints to stderr so that may be why you may not see the output if you only check stdout.
Cheers, Joelle
Thanks for the detailed explanation!
Best/Oveis
Hi Joelle,
I was checking the gene-best tests for one gene, and ran into something strange. So, the gene is GKAP1 and the trait is HCC (C22.0) which has been also reported here using UKBB WES 450K.
In v3.1, the first choice is Kuonen's SPA when P is < 1E-5 (which is the case in this example), while in v3.0.1 Davies' (same as SKAT R package) is used. Also, to be comparable with SKAT R package,vc-MACthr
was set to 1 (other arguments: firth
, pThresh 0.05
, ignore-pred
):
version/P | v.3.0.1 | v.3.1 | SKAT package |
---|---|---|---|
ADD | 4.46E-6 | 4.46E-6 | 9.6E-6 |
ADD-SKAT(1,25) | 8.1E-6 | 0.259 | 4.4E-5 |
ADD-SKAT-O | 6.5E-6 | 0.065 | 1.5E-5 |
ADD-ACATO | 1.7E-6 | 0.013 | - |
So, the question would be, is Kuonen more reliable than Davies? I understand the logic why Kuonen should be favored over Liu, but not for Davies. At least in this case, v.3.0.1 is more accurate than v3.1. Am I missing something here?
Thanks/Oveis
Hi,
v3.0.1 had bugs on how it was applying Firth/SPA correction; could you run both version without firth/spa to determine if the difference is due to the Firth/SPA correction implementation or the different use of Davies/Kuonen methods?
Hi Joelle,
This is what happens without spa/firth
version/P | v.3.0.1 | v.3.1 |
---|---|---|
ADD | 3.79E-37 | 3.79E-37 |
ADD-SKAT(1,25) | 2.28E-28 | 2.28E-28 |
ADD-SKAT-O | 1.94E-36 | 1.94E-36 |
And as you suspected it's only because of Firth/SPA. My bad!
Thanks! Best/Oveis
Hi Joelle,
I was comparing the results from gene-based tests in step 2 (no
pred
file) to SKAT R package for different binary/quantitative traits using a single gene with the same number of variants (all conditions are the same). However, the p-values are not exactly the same and sometimes (especially for SKAT test) very different. Is there any difference between implementation of SKAT and REGENIE or is this difference only due to the way REGENIE builds masks for ultra-rare variants (i.e. similar to SAIGE-GENE+ by aggreagating those below some MAC cutoff into a new score)?Log for a quantitative trait:
For a binary trait:
Thanks! Best/Oveis