hanchenphd / GMMAT

Generalized linear Mixed Model Association Tests
Other
36 stars 22 forks source link

Error: inv_sympd(): matrix is singular or not positive definite #62

Open Jia21 opened 8 months ago

Jia21 commented 8 months ago

Hi,

I met the issue that "Error: inv_sympd(): matrix is singular or not positive definite" when using glmmkin and glmm.wald in GWAS analysis. We added the age, sex, top 10 PCs, and recruitment sites as covariates and checked there was no high correlation between covariates. I am wondering any suggestions/comments about how to resolve this issue?

Thank you so much!

hanchenphd commented 8 months ago

Can you turn on verbose = TRUE and see if there are any numerical or convergence issues?

Jia21 commented 8 months ago

Can you turn on verbose = TRUE and see if there are any numerical or convergence issues?

Yes,

Here is the log information.

I specified two SNPs as examples in glmm.wald. After the first SNP failed, It began to run the second SNP, but still failed after Iteranation 38.

Iteration 45 : Variance component estimates: [1] 1.0000000 0.1816159 Fixed-effect coefficients: [1] -36.446245482 1.042957748 0.045823648 -0.011587857 31.941050586 [6] 31.307497606 31.713237746 31.767771860 31.818582502 31.811060479 [11] 31.736427430 31.884654203 31.912535230 32.019469464 31.632941611 [16] 32.201665511 31.829010417 32.261927793 31.777219632 31.396711665 [21] 31.963521855 31.207691174 31.354248461 32.058397332 0.004579815 [26] -0.005586743 -0.011035033 -0.018517951 0.015043698 0.014192346 [31] 0.009828516 0.005838099 -0.004300693 0.012409819 0.007344330

Iteration 46 : Error : inv_sympd(): matrix is singular or not positive definite |=================================== | 50%

Analyze SNP 2 Fixed-effect coefficients: (Intercept) sex age e center1 -1.641089e+01 1.043083e+00 4.583311e-02 -1.158868e-02 1.211724e+01 center2 center3 center4 center5 center6 1.155037e+01 1.189920e+01 1.194016e+01 1.193149e+01 1.189653e+01 center7 center8 center9 center10 center11 1.185279e+01 1.206266e+01 1.202170e+01 1.221095e+01 1.159504e+01 center12 center13 center14 center15 center16 1.219916e+01 1.178182e+01 1.225588e+01 1.188455e+01 1.162023e+01 center17 center18 center19 center20 pc1 1.216823e+01 1.134672e+01 1.144921e+01 1.214742e+01 4.714811e-03 pc2 pc3 pc4 pc5 pc6 -5.624653e-03 -1.110306e-02 -1.856835e-02 1.493352e-02 1.421669e-02 pc7 pc8 pc9 pc10 SNP__ 9.596273e-03 5.813233e-03 -4.387833e-03 1.259384e-02 -7.193398e-04

Iteration 1 : ... Iteration 37 : Variance component estimates: [1] 1.0000000 0.6798453 Fixed-effect coefficients: [1] -3.603040e+01 1.127197e+00 5.029376e-02 -1.269754e-02 3.139293e+01 [6] 3.074900e+01 3.078641e+01 3.115123e+01 3.101003e+01 3.103395e+01 [11] 3.103080e+01 3.108741e+01 3.120485e+01 -3.506452e+15 3.081809e+01 [16] 3.150742e+01 3.112558e+01 3.138106e+01 3.107799e+01 3.061680e+01 [21] 3.141249e+01 3.057467e+01 3.064083e+01 3.140293e+01 4.851674e-03 [26] -4.277461e-03 -1.279973e-02 -1.964385e-02 1.664745e-02 1.507645e-02 [31] 9.978521e-03 6.433225e-03 -4.875108e-03 1.390258e-02 1.037610e-03

Iteration 38 : Error: inv_sympd(): matrix is singular or not positive definite |======================================================================| 100%

Thank you!

hanchenphd commented 8 months ago

The problem came from your 20 dummy variables for center. In a logistic model, a regression coefficient of 30 is equivalent to an odds ratio of exp(30) = 10^13, and it was causing the issue.

Jia21 commented 8 months ago

31.941050586 [6] 31.307497606 31.713237746 31.767771860 31.818582502 31.811060479 [11] 31.736427430 31.884654203 31.912535230 32.019469464 31.632941611 [16] 32.201665511 31.829010417 32.261927793 31.777219632 31.396711665 [21] 31.963521855 31.207691174 31.354248461 32.058397332

I see your points, but how can I deal with the covariate of center (recruitment sites), we have to add this to be as covariate for our analysis. Thank you for your prompt reply!

Best

hanchenphd commented 8 months ago

I would suggest that you talk to your collaborator who created these variables and see what your options are. This is not a specific problem for GLMMs. If you fit a logistic regression model (without random effects), for your second SNP you would still get an intercept of -16.41 and large positive (>10) coefficients for all 20 dummy variables for center. This is unrealistic as the numerical issue suggests that your reference group (with all 20 dummy variables for center being equal to 0) almost only had controls, and the odds ratio is almost infinity for all 20 centers, compared to the reference group.

Jia21 commented 8 months ago

I would suggest that you talk to your collaborator who created these variables and see what your options are. This is not a specific problem for GLMMs. If you fit a logistic regression model (without random effects), for your second SNP you would still get an intercept of -16.41 and large positive (>10) coefficients for all 20 dummy variables for center. This is unrealistic as the numerical issue suggests that your reference group (with all 20 dummy variables for center being equal to 0) almost only had controls, and the odds ratio is almost infinity for all 20 centers, compared to the reference group.

you

Thank you, I see your points, I will consider that. Appreciate your suggestions and comments!