Gene-based tests are inconsistent with SKAT package

Ojami commented 2 years ago

Hi Joelle,

I was comparing the results from gene-based tests in step 2 (no pred file) to SKAT R package for different binary/quantitative traits using a single gene with the same number of variants (all conditions are the same). However, the p-values are not exactly the same and sometimes (especially for SKAT test) very different. Is there any difference between implementation of SKAT and REGENIE or is this difference only due to the way REGENIE builds masks for ultra-rare variants (i.e. similar to SAIGE-GENE+ by aggreagating those below some MAC cutoff into a new score)?

Log for a quantitative trait:

Start time: Wed Apr 27 16:59:24 2022

              |=============================|
              |      REGENIE v3.0.1.gz      |
              |=============================|

Copyright (c) 2020-2022 Joelle Mbatchou, Andrey Ziyatdinov and Jonathan Marchini.
Distributed under the MIT License.
Compiled with Boost Iostream library.
Using Intel MKL with Eigen.

Log of output saved in file : Triglycerides.vxHIy.1.chunk1.log

Options in effect:
  --bed /mnt/e/WES/chr8 \
  --phenoFile Triglycerides.vxHIy.phenoFile.txt \
  --covarFile Triglycerides.vxHIy.covarFile.txt \
  --lowmem \
  --ignore-pred \
  --minMAC 5 \
  --minINFO 0 \
  --threads 64 \
  --step 2 \
  --bsize 5000 \
  --verbose \
  --out Triglycerides.vxHIy.1.chunk1 \
  --af-cc \
  --debug \
  --anno-file Triglycerides.vxHIy.ann.txt \
  --set-list Triglycerides.vxHIy.set.txt \
  --mask-def Triglycerides.vxHIy.mask.txt \
  --extract-sets Triglycerides.vxHIy.set.tmp1.txt \
  --aaf-bins 0.005,0.01 \
  --build-mask max \
  --singleton-carrier \
  --vc-tests skato,acato-full,skato-acat \
  --vc-maxAAF 0.01 \
  --vc-MACthr 10 \
  --joint acat \
  --write-mask-snplist \
  --check-burden-files 

WARNING: disabling option --af-cc (only for BTs in step 2 in native output format split by trait).
Association testing mode (joint tests) with fast multithreading using OpenMP
 * bim              : [/mnt/e/WES/chr8.bim] n_snps = 283490
 * fam              : [/mnt/e/WES/chr8.fam] n_samples = 454756
 * bed              : [/mnt/e/WES/chr8.bed]
WARNING: Dosages are not present in the genotype file. Option --minINFO is skipped.
 * phenotypes       : [Triglycerides.vxHIy.phenoFile.txt] n_pheno = 1
   -dropping observations with missing values at any of the phenotypes
   -number of phenotyped individuals with no missing data = 323437
 * covariates       : [Triglycerides.vxHIy.covarFile.txt] n_cov = 13
   -number of individuals with covariate data = 323437
 * number of individuals used in analysis = 323437
 * no step 1 predictions given. Simple linear regression will be performed
   -residualizing and scaling phenotypes...done (8ms) 
 * annotations      : [Triglycerides.vxHIy.ann.txt] 
   +number of annotations categories = 2
 * masks            : [Triglycerides.vxHIy.mask.txt] n_masks = 1
 * aaf cutoffs      : [ 2 : 0.005 0.01 ] + singletons
 * set file         : [Triglycerides.vxHIy.set.txt] n_sets = 1
     +report on burden input files written to [Triglycerides.vxHIy.1.chunk1_masks_report.txt]
   -keeping only specified sets
     +number of sets remaining in the analysis = 1
 * # threads        : [64]
 * # tested sets    : [1]
 * max block size   : [5000]
 * rule used to build masks : max
 * computing gene-based tests for each set of variants included in a mask (rho=[0,0.01,0.04,0.09,0.16,0.25,0.5,1])
  -variants with MAC <= 10 are collapsed into a mask
  -weights are obtained from Beta(MAF,1,25)
 * approximate memory usage : 18GB
 * writing list of variants for each mask in file [Triglycerides.vxHIy.1.chunk1_masks.snplist]
 * using minimum MAC of 5 (masks with lower MAC are ignored)
 * using minimum imputation info score of 0 (variants with lower info score are ignored)

Chromosome 8 [1 sets in total]
 set [1/1] : 41 variants...1 chunks
     -reading in genotypes, computing gene-based tests and building masks...done (219ms) 
     -computing association tests...done (11ms) 
     -computing joint association tests...done (0ms) 

Association results stored separately for each trait in files : 
* [Triglycerides.vxHIy.1.chunk1_Triglycerides.regenie]

Number of ignored tests due to low MAC or info score : 0

Elapsed time : 3.67958s
End time: Wed Apr 27 16:59:28 2022

tool//P	SKAT	REGENIE
Burden (ADD in REGENIE)	1.3E-57	1.22E-57
SKAT beta(1,25)	2.84E-38	2.20E-40
SKAT-O	9.06E-57	1.24E-56

For a binary trait:

Options in effect:
  --bed /mnt/e/WES/TGAcutechr8 \
  --phenoFile TGAcute.GiIaV.phenoFile.txt \
  --covarFile TGAcute.GiIaV.covarFile.txt \
  --bt \
  --lowmem \
  --ignore-pred \
  --spa \
  --minMAC 5 \
  --minINFO 0 \
  --threads 64 \
  --step 2 \
  --bsize 5000 \
  --verbose \
  --out TGAcute.GiIaV.1.chunk1 \
  --af-cc \
  --debug \
  --anno-file TGAcute.GiIaV.ann.txt \
  --set-list TGAcute.GiIaV.set.txt \
  --mask-def TGAcute.GiIaV.mask.txt \
  --extract-sets TGAcute.GiIaV.set.tmp1.txt \
  --aaf-bins 0.005,0.01 \
  --build-mask max \
  --singleton-carrier \
  --vc-tests skato,acato-full,skato-acat \
  --vc-maxAAF 0.01 \
  --vc-MACthr 10 \
  --joint acat \
  --write-mask-snplist \
  --check-burden-files 

Association testing mode (joint tests) with fast multithreading using OpenMP
 * bim              : [/mnt/e/WES/chr8.bim] n_snps = 283490
 * fam              : [/mnt/e/WES/TGAcutechr8.fam] n_samples = 454756
 * bed              : [/mnt/e/WES/TGAcutechr8.bed]
WARNING: Dosages are not present in the genotype file. Option --minINFO is skipped.
 * phenotypes       : [TGAcute.GiIaV.phenoFile.txt] n_pheno = 1
   -dropping observations with missing values at any of the phenotypes
   -number of phenotyped individuals with no missing data = 339265
 * covariates       : [TGAcute.GiIaV.covarFile.txt] n_cov = 13
   -number of individuals with covariate data = 339265
 * number of individuals used in analysis = 339265
 * case-control counts for each trait:
   - 'TGAcute': 1096 cases and 338169 controls
 * no step 1 predictions given. Simple logistic regression will be performed
 * annotations      : [TGAcute.GiIaV.ann.txt] 
   +number of annotations categories = 2
 * masks            : [TGAcute.GiIaV.mask.txt] n_masks = 1
 * aaf cutoffs      : [ 2 : 0.005 0.01 ] + singletons
 * set file         : [TGAcute.GiIaV.set.txt] n_sets = 1
     +report on burden input files written to [TGAcute.GiIaV.1.chunk1_masks_report.txt]
   -keeping only specified sets
     +number of sets remaining in the analysis = 1
 * # threads        : [64]
 * # tested sets    : [1]
 * max block size   : [5000]
 * rule used to build masks : max
 * computing gene-based tests for each set of variants included in a mask (rho=[0,0.01,0.04,0.09,0.16,0.25,0.5,1])
  -variants with MAC <= 10 are collapsed into a mask
  -weights are obtained from Beta(MAF,1,25)
 * approximate memory usage : 44GB
 * writing list of variants for each mask in file [TGAcute.GiIaV.1.chunk1_masks.snplist]
 * using minimum MAC of 5 (masks with lower MAC are ignored)
 * using minimum imputation info score of 0 (variants with lower info score are ignored)
 * using SPA correction for logistic regression p-values less than 0.05

Chromosome 8 [1 sets in total]
   -fitting null logistic regression on binary phenotypes...done (403ms) 
 set [1/1] : 41 variants...1 chunks
     -reading in genotypes, computing gene-based tests and building masks...done (476ms) 
     -computing association tests...done (13ms) 
     -computing joint association tests...done (0ms) 

Association results stored separately for each trait in files : 
* [TGAcute.GiIaV.1.chunk1_TGAcute.regenie]

Number of tests with SPA correction : 1
Number of failed tests : (0/1)
Number of ignored tests due to low MAC or info score : 0

Elapsed time : 4.27713s
End time: Wed Apr 27 17:38:24 2022

tool//P	SKAT	REGENIE
Burden (ADD in REGENIE)	0.102	0.093
SKAT beta(1,25)	0.063	0.088
SKAT-O	0.0993	0.123

Thanks! Best/Oveis

joellembatchou commented 2 years ago

Hi Oveis,

Yes, if there are variants with MAC<=10, these would be aggregated together in REGENIE prior to running the SKAT tests which will cause discrepancy with R SKAT (unless you also do it in R). Regenie uses Davies' exact method by default to obatin p-values and if that fails, it uses Kuonen's SPA, and only if that also fails does it uses the modified Liu's method (which is what is used as fallback in R SKAT). We decided to prioritize Kuonen's method over Liu's since previous studies have found that Liu can give underestimate the very small p-values (e.g. Baolin Wu et al. Ann Hum Genet 2016).

For your BT run, you are running Regenie with SPA which may cause differences in the results since SKAT R would not use SPA correction. FYI we have released a new Regenie version (v3.1) which has a couple bug fixes for Firth/SPA in SKAT tests.

Note: the Burden results in REGENIE (i.e. ADD) correspond to the burden mask (by default it collapse variants by taking maximum number of ALT alleles across sites). So not the same as Burden test in R SKAT which uses the Beta(1,25) weights when aggregating variants.

Cheers, Joelle

Ojami commented 2 years ago

Hi Joelle,

Thanks for the response!

For your BT run, you are running Regenie with SPA which may cause differences in the results since SKAT R would not use SPA correction. FYI we have released a new Regenie version (v3.1) which has a couple bug fixes for Firth/SPA in SKAT tests.

In my example above all p-values for the binary trait are > 0.05 so SPA doesn't affect those p-values (regardless, SKAT function SKATBinary_Robust from Zhao et al. AJHG 2020 also applies ER and SPA). Also,

Note: the Burden results in REGENIE (i.e. ADD) correspond to the burden mask (by default it collapse variants by taking maximum number of ALT alleles across sites). So not the same as Burden test in R SKAT which uses the Beta(1,25) weights when aggregating variants.

Yes, sorry I forgot to mention that. I just reportred burden p-value for the sake of completeness, otherwise they're different approaches.

Overall, it looks that the only difference is because of REGENIE's behavior in aggregating ultra-rare variants. ~~Do you think in near future REGENIE can output also the method used to calculate p-value (with --debug flag)? SKAT does that (either Davies or Liu in case Davies fails), and may help in diagnostics.~~ --debug already does this, but for some reasons, --debug doesn't work with bash; e.g. something like this:

#!/bin/bash
regenie chr1 ... &
regenie chr2 ... &
wait

[PS] There is also some difference for SKAT-O p-values between v.3.0.1 and v.3.1 (latest) for the quantitative trait (above):

version//P	v.3.1	v.3.0.1
ADD	1.22E-57	1.22E-57
SKAT beta(1,25)	2.20E-40	2.20E-40
SKAT-O	1.41E-57	1.24E-56

What could possibly cause this difference (becaue SKAT is intact)?

Best/ Oveis

joellembatchou commented 2 years ago

Hi Oveis,

In v3.0.1, in the SKATO numerical integration step computing tail probabilities from a mixture of chisq1 distribution, Davies' method was used as default and if it failed, Kuonen's SPA was used (and Liu was then used if SPA failed). In v3.1, Kuonen's SPA is used whenever Davies' p-value is below 10^{-5} and if SPA fails, we retry Davies with more stringent parameters (following Baolin Wu et al. Ann Hum Genet 2016). Liu is only applied if all of these fail. Furthermore, following the same paper, we have used their formula in Eq(3) to compute the SKATO integral (instead of the original one from SKATO 2012 paper) which can give better accuracy for smaller p-values.

For the --debug option, it prints to stderr so that may be why you may not see the output if you only check stdout.

Cheers, Joelle

Ojami commented 2 years ago

Thanks for the detailed explanation!

Best/Oveis

Ojami commented 2 years ago

Hi Joelle,

I was checking the gene-best tests for one gene, and ran into something strange. So, the gene is GKAP1 and the trait is HCC (C22.0) which has been also reported here using UKBB WES 450K. In v3.1, the first choice is Kuonen's SPA when P is < 1E-5 (which is the case in this example), while in v3.0.1 Davies' (same as SKAT R package) is used. Also, to be comparable with SKAT R package,vc-MACthr was set to 1 (other arguments: firth, pThresh 0.05, ignore-pred):

version/P	v.3.0.1	v.3.1	SKAT package
ADD	4.46E-6	4.46E-6	9.6E-6
ADD-SKAT(1,25)	8.1E-6	0.259	4.4E-5
ADD-SKAT-O	6.5E-6	0.065	1.5E-5
ADD-ACATO	1.7E-6	0.013	-

So, the question would be, is Kuonen more reliable than Davies? I understand the logic why Kuonen should be favored over Liu, but not for Davies. At least in this case, v.3.0.1 is more accurate than v3.1. Am I missing something here?

Thanks/Oveis

joellembatchou commented 2 years ago

Hi,

v3.0.1 had bugs on how it was applying Firth/SPA correction; could you run both version without firth/spa to determine if the difference is due to the Firth/SPA correction implementation or the different use of Davies/Kuonen methods?

Ojami commented 2 years ago

Hi Joelle,

This is what happens without spa/firth

version/P	v.3.0.1	v.3.1
ADD	3.79E-37	3.79E-37
ADD-SKAT(1,25)	2.28E-28	2.28E-28
ADD-SKAT-O	1.94E-36	1.94E-36

And as you suspected it's only because of Firth/SPA. My bad!

Thanks! Best/Oveis

rgcgithub / regenie

Gene-based tests are inconsistent with SKAT package #291