weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
191 stars 73 forks source link

Singular matrix error #21

Closed zd1 closed 6 years ago

zd1 commented 6 years ago

Hi there,

I'm getting singular matrix problems from (I think) this line:

https://github.com/weizhouUMICH/SAIGE/blob/9820dc26eae028e7f940c61bee9c20b9a4e44589/src/SAIGE_fitGLMM_fast.cpp#L889

I am running it on a UK Biobank phenotype. The detailed error message is shown at the bottom. Any help would be appreciated.

Thanks, Zhihao

` 486801 samples have genotypes formula is hypertension_medicated~body_mass_index+pc3+chip+pc2+pc4+pc1+age+pc5+sex+blood_pressure_medication 91215 samples have non-missing phenotypes 395626 samples in geno file do not have phenotypes 91175 samples will be used for analysis colnames(data.new) is Y 1 body_mass_index pc3 chip pc2 pc4 pc1 age pc5 sex blood_pressure_medication out.transform$Param.transform$qrr: 11 11 hypertension_medicated is a binary trait

 Call:  glm(formula = formula.new, family = binomial, data = data.new)

 Coefficients:
       (Intercept)            body_mass_index
         -24.40748                   -2.13629
           pc3                       chip
           0.14096                   -0.04922
           pc2                        pc4
          -0.03856                   -0.09343
           pc1                        age
           0.09213                    2.14034
           pc5                        sex
          -0.09750                    1.65620
 blood_pressure_medication
          10.33080

 Degrees of Freedom: 91174 Total (i.e. Null);  91164 Residual
 Null Deviance:      62940
 Residual Deviance: 3940         AIC: 3962
 [1] "Start reading genotype plink file here"

 M: 10000, N: 486801
 0.0356183 0.00524815 0.0573787 0.0422704 0.0161503 0.056419 0.0380532 0.0499534 0.0246614 0.173924 0.0590019 0.0328434 0.0568632 0.0194681 0.0357774 0.000433233 0.0169783 0.196808 0.0223252 0.0615794 0.151456 0.0164409 0.0504086 0.0905895 0.0220126 0.0583932 0.00920208 0.391917 0.0197148 0.0419468 0.0351412 0.0826981 0.101892 0.2047 5.48396e-06 0.0134302 0.0203565 5.48396e-06 0.236501 0.0108253 0.0596874 0.00540718 5.48396e-06 0.0576638 0.221843 0.314116 0.0199452 0.0222704 0.0188539 0.0898821 0.151429 0.0376309 0.0303482 0.202764 0.280011 0.0270304 0.0276885 0.0379874 0.143855 0.0341431 0.0151522 0.388188 0.0336551 0.12164 0.050244 0.0571703 0.0311544 0.195585 0.0843323 0.0181135 0.0305292 0.0310995 0.126882 0.0520976 0.0198684 0.00971758 0.00109679 0.0340609 8.22594e-05 5.48396e-06 0.045994 0.0153825 5.48396e-05 0.0845462 0.159644 0.147584 0.229092 0.0432904 0.338207 0.496052 0.172218 7.12915e-05 0.0821223 0.3498 0.0433507 0.314176 0.0531615 0.0511818 0.0248643 5.48396e-06
 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 M = 10000
 N = 486801
 time: 28979.6
 [1] "Genotype reading is done"
 iGet_Coef:  1
 iter from getPCG1ofSigmaAndVector 5
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4

 error: inv_sympd(): matrix is singular or not positive definite
 Timing stopped at: 254.244 14.74 50.065

 Stderr:
 Warning message:
 package ‘SAIGE’ was built under R version 3.4.3
 Loading required package: optparse
 Error in getCoefficients(Y, X, W, tau, maxiter = maxiterPCG, tol = tolPCG) :
   inv_sympd(): matrix is singular or not positive definite
 Calls: fitNULLGLMM ... glmmkin.ai_PCG_Rcpp_Binary -> Get_Coef -> getCoefficients -> .Call
 In addition: Warning messages:
 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
 2: glm.fit: fitted probabilities numerically 0 or 1 occurred
 Execution halted

`

zd1 commented 6 years ago

I guess I could add a preprocessing step to manually remove SNPs in high LD after removing the missing samples. But I was wondering if SAIGE would make some attempt for that?

Thanks, Zhihao

weizhouUMICH commented 6 years ago

Hi Zhihao,

Thank you for your feedback! It is likely that the error is due to the perfect separation similiar to this one

https://github.com/weizhouUMICH/SAIGE/issues/17

Could you please also try the SAIGE version 0.26.6 to check the output for "mu" ? https://www.dropbox.com/s/gv872855j30ixux/SAIGE_0.26.6_R_x86_64-pc-linux-gnu.tar.gz?dl=0

Thanks, Wei

zd1 commented 6 years ago

Hi Wei,

Thanks very much for your quick reply. Yes, I'll give that version a go.

Zhihao

zd1 commented 6 years ago

Just to let you know that the new version does run for me. I am only getting warning like this Warning messages: 1: glm.fit: fitted probabilities numerically 0 or 1 occurred 2: glm.fit: fitted probabilities numerically 0 or 1 occurred which suggests perfect separation has happened. I think the problem is more to do with the input specification rather than with SAIGE. I am getting some results out of it at least, even though the model is not well specified. Thanks you for your efforts.

Zhihao

medinacarolina commented 3 years ago

Hi!. I get this warning but I am not sure how this misspecification will affect my results, meaning if it worth for me to try to find the problems with my model before continue to the second step of the model... I know I loose have of my sample due to missingness in covariates or phenotypes, not sure thats causing any problems