rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
185 stars 55 forks source link

slight difference between 2.2.4 and 3.0 version #265

Closed Shicheng-Guo closed 2 years ago

Shicheng-Guo commented 2 years ago

Dear Joelle,

I use exact same input data and parameters (including step 1 input) for 2.2.4 and 3.0 version and notice there is a slight/very tiny difference on beta and P-value (at the 6th digit after decimal), I am wondering is there any difference between these two version on BETA and SE estimation?

## take single phencode matrix as input to perform pheWAS within 3.0
(base) [sguo2@login01 KDR]$ head /home/sguo2/janssen4/bin/regescript/regenie3/bin/KDR/UKB.phecode.v2021Q4.IMP38.chr4.KDR.additive.SNV_X334.2.regenie
CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ N TEST BETA SE CHISQ LOG10P EXTRA
4 54078619 chr4_54078619_G_A G A 0.369045 388301 ADD -0.0232063 0.0636538 0.132912 0.145432 NA
4 54078667 chr4_54078667_T_C T C 8.9047e-06 393051 ADD -1.00204 11.9844 0.00699093 0.0299484 NA
4 54078858 chr4_54078858_G_A G A 0.000372951 392813 ADD -1.00394 1.63199 0.378425 0.268857 NA
4 54078886 chr4_54078886_T_C T C 8.90467e-06 393052 ADD -1.00112 14.1737 0.00498891 0.0251704 NA
4 54078925 chr4_54078925_C_T C T 2.41709e-05 393035 ADD -1.00281 6.59982 0.0230871 0.0558971 NA
4 54078927 chr4_54078927_C_A C A 7.88774e-05 393015 ADD -1.00338 3.69162 0.0738745 0.104701 NA
4 54078989 chr4_54078989_G_T G T 4.1983e-05 393016 ADD -1.00421 4.50188 0.0497576 0.084344 NA
4 54079024 chr4_54079024_C_T C T 0.00645383 380007 ADD -0.151149 0.37956 0.158581 0.160857 NA
4 54079090 chr4_54079090_G_T G T 4.70732e-05 393005 ADD -1.00963 4.26734 0.0559766 0.0899244 NA

## take 1600 phencode matrix as input to perform pheWAS within 3.0
(base) [sguo2@login01 KDR]$ head UKB.phecode.v2021Q4.IMP38.chr4.KDR.additive.SNV_X334.2.regenie
CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ N TEST BETA SE CHISQ LOG10P EXTRA
4 54078619 chr4_54078619_G_A G A 0.369045 388301 ADD -0.0232063 0.0636538 0.132912 0.145432 NA
4 54078667 chr4_54078667_T_C T C 8.9047e-06 393051 ADD -1.00204 11.9844 0.00699093 0.0299484 NA
4 54078858 chr4_54078858_G_A G A 0.000372951 392813 ADD -1.00394 1.63199 0.378425 0.268857 NA
4 54078886 chr4_54078886_T_C T C 8.90467e-06 393052 ADD -1.00112 14.1737 0.00498891 0.0251704 NA
4 54078925 chr4_54078925_C_T C T 2.41709e-05 393035 ADD -1.00281 6.59982 0.0230871 0.0558971 NA
4 54078927 chr4_54078927_C_A C A 7.88774e-05 393015 ADD -1.00338 3.69162 0.0738745 0.104701 NA
4 54078989 chr4_54078989_G_T G T 4.1983e-05 393016 ADD -1.00421 4.50188 0.0497576 0.084344 NA
4 54079024 chr4_54079024_C_T C T 0.00645383 380007 ADD -0.151149 0.37956 0.158581 0.160857 NA
4 54079090 chr4_54079090_G_T G T 4.70732e-05 393005 ADD -1.00963 4.26734 0.0559766 0.0899244 NA

## take single phencode matrix as input to perform pheWAS with 2.2.4
(base) [sguo2@login01 KDR]$ head /home/sguo2/janssen4/baklava/r3/IMP381M/KDR/UKB.phecode.v2021Q4.IMP38.chr4.KDR.additive.SNV_X334.2.regenie
CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ N TEST BETA SE CHISQ LOG10P EXTRA
4 54078619 chr4_54078619_G_A G A 0.369045 388301 ADD -0.0232068 0.0636539 0.132917 0.145435 NA
4 54078667 chr4_54078667_T_C T C 8.9047e-06 393051 ADD -1.00204 11.9843 0.00699111 0.0299488 NA
4 54078858 chr4_54078858_G_A G A 0.000372951 392813 ADD -1.00391 1.63199 0.378399 0.268845 NA
4 54078886 chr4_54078886_T_C T C 8.90467e-06 393052 ADD -1.00112 14.1732 0.00498925 0.0251713 NA
4 54078925 chr4_54078925_C_T C T 2.41709e-05 393035 ADD -1.00281 6.59971 0.0230879 0.0558981 NA
4 54078927 chr4_54078927_C_A C A 7.88774e-05 393015 ADD -1.00338 3.69153 0.073878 0.104703 NA
4 54078989 chr4_54078989_G_T G T 4.1983e-05 393016 ADD -1.00421 4.50179 0.0497598 0.0843461 NA
4 54079024 chr4_54079024_C_T C T 0.00645383 380007 ADD -0.151155 0.379559 0.158594 0.160865 NA
4 54079090 chr4_54079090_G_T G T 4.70732e-05 393005 ADD -1.00963 4.26717 0.0559813 0.0899285 NA

## take 1600 phencode matrix as input to perform pheWAS with 2.2.4
(base) [sguo2@login01 KDR]$ head /home/sguo2/janssen4/baklava/phewas/2022Q1/IMP381M/KDR/UKB.phecode.v2021Q4.IMP38.chr4.KDR.additive.SNV_X334.2.regenie
CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ N TEST BETA SE CHISQ LOG10P EXTRA
4 54078619 chr4_54078619_G_A G A 0.368902 343220 ADD -0.0837615 0.0711331 1.38658 0.621633 NA
4 54078667 chr4_54078667_T_C T C 1.00743e-05 347420 ADD -1.00162 13.2241 0.0057369 0.0270458 NA
4 54078858 chr4_54078858_G_A G A 0.000371525 347218 ADD -1.00192 1.87972 0.284107 0.226197 NA
4 54078886 chr4_54078886_T_C T C 1.00742e-05 347421 ADD -1.00093 15.3567 0.00424831 0.0231773 NA
4 54078925 chr4_54078925_C_T C T 2.59062e-05 347407 ADD -1.00247 7.74376 0.0167586 0.0472087 NA
4 54078927 chr4_54078927_C_A C A 8.20411e-05 347387 ADD -1.00339 4.02447 0.0621612 0.0952238 NA
4 54078989 chr4_54078989_G_T G T 4.46184e-05 347390 ADD -1.00368 4.87675 0.0423578 0.0773061 NA
4 54079024 chr4_54079024_C_T C T 0.00636498 335979 ADD -0.11373 0.424464 0.0717912 0.103063 NA
4 54079090 chr4_54079090_G_T G T 5.03771e-05 347380 ADD -1.00753 4.68863 0.0461765 0.0809976 NA

Here is my script:

Options in effect:
  --step 2 \
  --pred /home/sguo2/janssen4/regenie/2021Q4/binStep1/pred.bt.2021Q4.txt \
  --bed /home/sguo2/janssen4/baklava/phewas/2021Q4/WGS1M/VAF/SCAP/ukb23196_chr3_SCAP.VAF \
  --keep /home/sguo2/janssen4/ukb/analytic/step1/qc_pass.id \
  --phenoFile /home/sguo2/data/PHENO/WES450K/UKB_PHECODE_Guo.vRegenie.IMP.2021Q4.1627.txt \
  --phenoCol X334.2 \
  --covarFile /home/sguo2/janssen4/ukb/WES-hg38/UKB.cov.r2.Guo.txt \
  --covarCol PC{1:20},Year,iSex,YearSquare \
  --test additive \
  --range 3:46413681-48477126 \
  --bt \
  --spa \
  --pThresh 0.01 \
  --approx \
  --threads 6 \
  --minCaseCount 100 \
  --bsize 1500 \
  --out /home/sguo2/janssen4/baklava/c23/WGS1M/SCAP/UKB.phecode.v2021Q4.WGS.VAF.chr3.SCAP.additive.SNV
joellembatchou commented 2 years ago

Hi Shicheng,

For BTs we slightly changed what output is stored from the null logistic regression to reduce the number of matrix operations performed when testing each variant (some involve matrix inversions which most likely is the source of the small changes). For the multi-trait runs, we fixed a bug which was not properly tracking missingness in the phenotype when testing each variant.

Cheers, Joelle

Shicheng-Guo commented 2 years ago

Great. Thank you Joelle for the confirm. Now it is clear!!