melff / mclogit

mclogit: Multinomial Logit Models, with or without Random Effects or Overdispersion
http://melff.github.io/mclogit/
22 stars 4 forks source link

Frequent error in mblogit "Error in solve.default(Information) : system is computationally singular: reciprocal condition number" with models that run OK in nnet::multinom #27

Closed tomwenseleers closed 1 year ago

tomwenseleers commented 2 years ago

I am frequently encountering fitting errors in mblogit, returning me the error "Error in solve.default(Information) : system is computationally singular: reciprocal condition number" for models that run OK in nnet::multinom. Would you happen to have any idea on how to resolve these? A reproducible example is given below. It could be that there is some problem with complete separation / some Hauck-Donner effect, but given that it happens so frequently, also in situations where the same fit in nnet::multinom runs OK I have the feeling there might be some bug in mblogit. Either way, any advice on how to get rid of this error would be welcome! In logistic regression I recall that one can add a small amount of noise to zero counts or add some ridge regularisation (e.g. by row augmenting the covariate matrix with a diagonal matrix with sqrt(lambda) with lambda=ridge penalty) to get rid of this sort of problems - I don't know if this could be a solution also for multinomial regression?

# data=SARS-CoV2 coronavirus variants (variant) through time (collection_date_num)
# in India, count=actual count (nr of sequenced genomes)

dat = read.csv("https://www.dropbox.com/s/u27cn44p5srievq/dat.csv?dl=1")
dat$variant = factor(dat$variant)

# 1. multinom::net MULTINOMIAL FIT ####
library(nnet)
library(splines)
set.seed(1)
fit_nnet = nnet::multinom(variant ~ ns(collection_date_num, df=2), 
                          weights=count, data=dat)
summary(fit_nnet)
# Coefficients:
#   (Intercept) ns(collection_date_num, df = 2)1
# Beta                 3.798225                        4.9204017
# Delta              -10.043973                       49.4947796
# Omicron (BA.1)    -265.410841                      487.1385690
# Omicron (BA.2)    -467.537309                      831.1140917
# Omicron (BA.2.74)   -4.616715                       15.4357149
# Omicron (BA.2.75)   11.365092                      -38.7658668
# Omicron (BA.2.76)    5.437985                        0.3403524
# Omicron (BA.3)     -15.924342                       42.5896877
# Omicron (BA.4)      -7.440555                       23.8097760
# Omicron (BA.5)      -2.940342                       15.3461333
# Other               19.322779                       -8.4319599
# ns(collection_date_num, df = 2)2
# Beta                                      35.17926
# Delta                                     55.21257
# Omicron (BA.1)                           195.27695
# Omicron (BA.2)                           312.90170
# Omicron (BA.2.74)                         75.53290
# Omicron (BA.2.75)                         79.80443
# Omicron (BA.2.76)                         70.73509
# Omicron (BA.3)                            75.60153
# Omicron (BA.4)                            73.26847
# Omicron (BA.5)                            74.45010
# Other                                     54.14069
# 
# Std. Errors:
#   (Intercept) ns(collection_date_num, df = 2)1
# Beta                1.8548783                        1.9750676
# Delta               0.8914081                        0.7919794
# Omicron (BA.1)      8.9955175                       15.6207548
# Omicron (BA.2)      7.0504753                       12.1069603
# Omicron (BA.2.74)   1.1513095                        0.7721499
# Omicron (BA.2.75)   2.5900436                        5.7317343
# Omicron (BA.2.76)  15.9802663                       27.2148132
# Omicron (BA.3)     27.1955306                       46.8288796
# Omicron (BA.4)      1.0868226                        0.6370319
# Omicron (BA.5)      1.3766617                        1.5280076
# Other               0.8118766                        0.6288743
# ns(collection_date_num, df = 2)2
# Beta                                      5.998582
# Delta                                     3.133279
# Omicron (BA.1)                            5.388977
# Omicron (BA.2)                            4.888832
# Omicron (BA.2.74)                         3.177189
# Omicron (BA.2.75)                         3.862689
# Omicron (BA.2.76)                         9.112225
# Omicron (BA.3)                           14.632091
# Omicron (BA.4)                            3.116746
# Omicron (BA.5)                            3.081921
# Other                                     3.128327
# 
# Residual Deviance: 128392.2 
# AIC: 128458.2 

# predicted proportions at last day calculated using emmeans:
library(emmeans)
multinom_emmeans = emmeans(fit_nnet, ~ variant,  
                       mode = "prob",
                       at=list(collection_date_num = 
                                 max(data_agbyweek1$collection_date_num)))
multinom_emmeans
# variant               prob       SE df lower.CL upper.CL
# Alpha             0.00e+00 0.00e+00 33 0.00e+00 0.00e+00
# Beta              0.00e+00 0.00e+00 33 0.00e+00 0.00e+00
# Delta             7.73e-06 1.17e-06 33 5.34e-06 1.01e-05
# Omicron (BA.1)    1.82e-04 6.42e-05 33 5.14e-05 3.13e-04
# Omicron (BA.2)    1.76e-01 7.45e-03 33 1.61e-01 1.91e-01
# Omicron (BA.2.74) 9.03e-02 7.98e-03 33 7.41e-02 1.07e-01
# Omicron (BA.2.75) 1.68e-01 1.90e-02 33 1.30e-01 2.07e-01
# Omicron (BA.2.76) 2.89e-01 1.35e-02 33 2.62e-01 3.16e-01
# Omicron (BA.3)    1.34e-02 2.10e-03 33 9.10e-03 1.76e-02
# Omicron (BA.4)    1.67e-02 2.47e-03 33 1.17e-02 2.17e-02
# Omicron (BA.5)    2.03e-01 1.08e-02 33 1.81e-01 2.25e-01
# Other             4.23e-02 3.15e-03 33 3.59e-02 4.87e-02
# 
# Confidence level used: 0.95 

# 2. SAME MULTINOMIAL FIT USING mblogit ####
library(mclogit)
fit_mblogit = mblogit(variant ~ ns(collection_date_num, df=2),
                      weight=count,
                      data=dat,
                      from.table=TRUE, dispersion=FALSE,
                      control=mclogit.control(maxit=27))
summary(fit_mblogit)
# Equation for Beta vs Alpha:
#   Estimate Std. Error z value Pr(>|z|)
# (Intercept)                      -2.140e+01  1.505e+06       0        1
# ns(collection_date_num, df = 2)1 -9.047e+00  3.062e+06       0        1
# ns(collection_date_num, df = 2)2  2.608e+01  1.379e+06       0        1
# 
# Equation for Delta vs Alpha:
#   Estimate Std. Error z value Pr(>|z|)
# (Intercept)                      -2.140e+01  1.505e+06       0        1
# ns(collection_date_num, df = 2)1 -9.047e+00  3.062e+06       0        1
# ns(collection_date_num, df = 2)2  2.608e+01  1.379e+06       0        1
# 
# Equation for Omicron (BA.1) vs Alpha:
#   Estimate Std. Error z value
# Omicron (BA.1)~(Intercept)                      -2.140e+01  1.505e+06       0
# Omicron (BA.1)~ns(collection_date_num, df = 2)1 -9.047e+00  3.062e+06       0
# Omicron (BA.1)~ns(collection_date_num, df = 2)2  2.608e+01  1.379e+06       0
# Pr(>|z|)
# Omicron (BA.1)~(Intercept)                             1
# Omicron (BA.1)~ns(collection_date_num, df = 2)1        1
# Omicron (BA.1)~ns(collection_date_num, df = 2)2        1
# 
# Equation for Omicron (BA.2) vs Alpha:
#   Estimate Std. Error z value
# Omicron (BA.2)~(Intercept)                      -6.051e+00  5.785e+05       0
# Omicron (BA.2)~ns(collection_date_num, df = 2)1 -3.120e+01  1.257e+06       0
# Omicron (BA.2)~ns(collection_date_num, df = 2)2  5.841e+01  3.135e+05       0
# Pr(>|z|)
# Omicron (BA.2)~(Intercept)                             1
# Omicron (BA.2)~ns(collection_date_num, df = 2)1        1
# Omicron (BA.2)~ns(collection_date_num, df = 2)2        1
# 
# Equation for Omicron (BA.2.74) vs Alpha:
#   Estimate Std. Error
# Omicron (BA.2.74)~(Intercept)                      -7.065e+00  5.791e+05
# Omicron (BA.2.74)~ns(collection_date_num, df = 2)1 -2.952e+01  1.258e+06
# Omicron (BA.2.74)~ns(collection_date_num, df = 2)2  5.573e+01  3.116e+05
# z value Pr(>|z|)
# Omicron (BA.2.74)~(Intercept)                            0        1
# Omicron (BA.2.74)~ns(collection_date_num, df = 2)1       0        1
# Omicron (BA.2.74)~ns(collection_date_num, df = 2)2       0        1
# 
# Equation for Omicron (BA.2.75) vs Alpha:
#   Estimate Std. Error
# Omicron (BA.2.75)~(Intercept)                      -6.311e+00  5.785e+05
# Omicron (BA.2.75)~ns(collection_date_num, df = 2)1 -3.077e+01  1.257e+06
# Omicron (BA.2.75)~ns(collection_date_num, df = 2)2  5.771e+01  3.131e+05
# z value Pr(>|z|)
# Omicron (BA.2.75)~(Intercept)                            0        1
# Omicron (BA.2.75)~ns(collection_date_num, df = 2)1       0        1
# Omicron (BA.2.75)~ns(collection_date_num, df = 2)2       0        1
# 
# Equation for Omicron (BA.2.76) vs Alpha:
#   Estimate Std. Error
# Omicron (BA.2.76)~(Intercept)                      -6.072e+00  5.784e+05
# Omicron (BA.2.76)~ns(collection_date_num, df = 2)1 -3.116e+01  1.257e+06
# Omicron (BA.2.76)~ns(collection_date_num, df = 2)2  5.835e+01  3.137e+05
# z value Pr(>|z|)
# Omicron (BA.2.76)~(Intercept)                            0        1
# Omicron (BA.2.76)~ns(collection_date_num, df = 2)1       0        1
# Omicron (BA.2.76)~ns(collection_date_num, df = 2)2       0        1
# 
# Equation for Omicron (BA.3) vs Alpha:
#   Estimate Std. Error z value
# Omicron (BA.3)~(Intercept)                      -2.140e+01  1.505e+06       0
# Omicron (BA.3)~ns(collection_date_num, df = 2)1 -9.047e+00  3.062e+06       0
# Omicron (BA.3)~ns(collection_date_num, df = 2)2  2.608e+01  1.379e+06       0
# Pr(>|z|)
# Omicron (BA.3)~(Intercept)                             1
# Omicron (BA.3)~ns(collection_date_num, df = 2)1        1
# Omicron (BA.3)~ns(collection_date_num, df = 2)2        1
# 
# Equation for Omicron (BA.4) vs Alpha:
#   Estimate Std. Error z value
# Omicron (BA.4)~(Intercept)                      -7.442e+00  5.800e+05       0
# Omicron (BA.4)~ns(collection_date_num, df = 2)1 -2.890e+01  1.260e+06       0
# Omicron (BA.4)~ns(collection_date_num, df = 2)2  5.475e+01  3.110e+05       0
# Pr(>|z|)
# Omicron (BA.4)~(Intercept)                             1
# Omicron (BA.4)~ns(collection_date_num, df = 2)1        1
# Omicron (BA.4)~ns(collection_date_num, df = 2)2        1
# 
# Equation for Omicron (BA.5) vs Alpha:
#   Estimate Std. Error z value
# Omicron (BA.5)~(Intercept)                      -6.825e+00  5.785e+05       0
# Omicron (BA.5)~ns(collection_date_num, df = 2)1 -2.992e+01  1.257e+06       0
# Omicron (BA.5)~ns(collection_date_num, df = 2)2  5.635e+01  3.120e+05       0
# Pr(>|z|)
# Omicron (BA.5)~(Intercept)                             1
# Omicron (BA.5)~ns(collection_date_num, df = 2)1        1
# Omicron (BA.5)~ns(collection_date_num, df = 2)2        1
# 
# Equation for Other vs Alpha:
#   Estimate Std. Error z value Pr(>|z|)
# (Intercept)                      -2.140e+01  1.505e+06       0        1
# ns(collection_date_num, df = 2)1 -9.047e+00  3.062e+06       0        1
# ns(collection_date_num, df = 2)2  2.608e+01  1.379e+06       0        1
# 
# Null Deviance:     13540 
# Residual Deviance: 7.354e-10 
# Number of Fisher Scoring iterations:  27 
# Number of observations:  43160 
# 
# 
# Note: Algorithm did not converge.

As you can see this fit returns unusually large standard errors, and increasing maxitto >30 results in the error

# Error in solve.default(Information) : 
#  system is computationally singular: reciprocal condition number = 1.66419e-16

It could be that this is due to complete separation / some Hauck-Donner effect, but given that this model runs OK in nnet::multinom I am tempted to think it could rather be a bug in mblogit.

# predicted variant proportions on last day calculated using emmeans :
mblogit_emmeans = emmeans(fit_mblogit, ~ variant,  
                           mode = "prob",
                           at=list(collection_date_num = 
                                     max(data_agbyweek1$collection_date_num)))
mblogit_emmeans
# variant             prob       SE  df asymp.LCL asymp.UCL
# Alpha             0.0000 9.00e-08 Inf -1.80e-07  2.00e-07
# Beta              0.0000 1.00e-08 Inf -2.00e-08  0.00e+00
# Delta             0.0000 1.00e-08 Inf -2.00e-08  0.00e+00
# Omicron (BA.1)    0.0000 1.00e-08 Inf -2.00e-08  0.00e+00
# Omicron (BA.2)    0.3653 3.73e-02 Inf  2.92e-01  4.38e-01
# Omicron (BA.2.74) 0.0299 1.32e-02 Inf  4.09e-03  5.58e-02
# Omicron (BA.2.75) 0.1916 3.05e-02 Inf  1.32e-01  2.51e-01
# Omicron (BA.2.76) 0.3473 3.68e-02 Inf  2.75e-01  4.20e-01
# Omicron (BA.3)    0.0000 1.00e-08 Inf -2.00e-08  0.00e+00
# Omicron (BA.4)    0.0120 8.42e-03 Inf -4.52e-03  2.85e-02
# Omicron (BA.5)    0.0539 1.75e-02 Inf  1.96e-02  8.81e-02
# Other             0.0000 1.00e-08 Inf -2.00e-08  0.00e+00
# 
# Confidence level used: 0.95

# NOTE: does not quite match emmeans nnet::multinom output
# above, but maybe because I couldn't run mblogit until convergence
melff commented 2 years ago

Thanks - I will look into it as soon as I can.

fweber144 commented 1 year ago

I also experience this error quite frequently. Here is a reprex (data uploaded here, vector of observation weights uploaded here):

dat <- read.csv("dat.csv", stringsAsFactors = TRUE)
w_obs <- read.csv("w_obs.csv")[[1]]

# Default fit (does not work):
mb_fit <- mclogit::mblogit(
  formula = y ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2,
  data = dat,
  random = ~ xco.1 | z.1,
  weights = w_obs,
  model = FALSE,
  y = FALSE
)
## --> Gives:
#
# Iteration 1 - deviance = 80.89403 - criterion = 0.9530748
# Iteration 2 - deviance = 73.66568 - criterion = 0.08980789
# Iteration 3 - deviance = 71.81153 - criterion = 0.03756291
# Iteration 4 - deviance = 69.95513 - criterion = 0.02145767
# Iteration 5 - deviance = 68.71881 - criterion = 0.01894345
# Iteration 6 - deviance = 69.63039 - criterion = 0.0154936
# Iteration 7 - deviance = 68.70349 - criterion = 0.01088839
# Iteration 8 - deviance = 68.69328 - criterion = 0.006974858
# Iteration 9 - deviance = 67.80436 - criterion = 0.007406875
# Iteration 10 - deviance = 69.87305 - criterion = 0.007387896
# Iteration 11 - deviance = 68.42121 - criterion = 0.00535707
# Iteration 12 - deviance = 65.89278 - criterion = 0.01301447
# Iteration 13 - deviance = 55.60609 - criterion = 0.02348028
# Iteration 14 - deviance = 46.98855 - criterion = 0.03045705
# Iteration 15 - deviance = 43.4102 - criterion = 0.03987452Error in solve.default(X[[i]], ...) :
#   system is computationally singular: reciprocal condition number = 1.87698e-19
##

# Modified fit 1 (works, but increasing `epsilon` is probably cheating):
mb_fit <- mclogit::mblogit(
  formula = y ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2,
  data = dat,
  random = ~ xco.1 | z.1,
  weights = w_obs,
  model = FALSE,
  y = FALSE,
  epsilon = 1e-1
)
## --> Gives:
#
# Iteration 1 - deviance = 80.89403 - criterion = 0.9530748
# Iteration 2 - deviance = 73.66568 - criterion = 0.08980789
# converged
##

# Modified fit 2 (works, but "Algorithm stopped due to false convergence"):
mb_fit <- mclogit::mblogit(
  formula = y ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2,
  data = dat,
  random = ~ xco.1 | z.1,
  weights = w_obs,
  model = FALSE,
  y = FALSE,
  avoid.increase = TRUE
)
## --> Gives:
#     Stepsize halved - new deviance =  66.79285 
#   Stepsize halved - new deviance =  63.6818 
#   Stepsize halved - new deviance =  62.89986 
#   Stepsize halved - new deviance =  62.70058 
#   Stepsize halved - new deviance =  62.64825 
#   Stepsize halved - new deviance =  62.63382 
#   Stepsize halved - new deviance =  62.62953 
#   Stepsize halved - new deviance =  62.62811 
#   Stepsize halved - new deviance =  62.62758 
#   Stepsize halved - new deviance =  62.62736 
#   Stepsize halved - new deviance =  62.62727 
#   Stepsize halved - new deviance =  62.62722 
#   Stepsize halved - new deviance =  62.6272 
#   Stepsize halved - new deviance =  62.62719 
#   Stepsize halved - new deviance =  62.62718 
#   Stepsize halved - new deviance =  62.62718 
# 
# Iteration 1 - deviance = 62.62718 - criterion = 4.682507e-09
# converged
# Warning messages:
# 1: step size truncated due to possible divergence 
# 2: Algorithm stopped due to false convergence
##

As mentioned in the inline code comments, the "Modified fit 1" works, but increasing epsilon is probably cheating. So I wouldn't favor that solution. The "Modified fit 2" works as well, but the warning Algorithm stopped due to false convergence probably indicates that convergence is given neither (although it says converged directly above the warnings). So I wonder if there is some "tuning setup" that the user could choose to achieve convergence. If not, would it be possible to change the underlying code in the mclogit package to make the algorithm converge in such cases?

If I am wrong and "Modified fit 2" is actually converging (despite the warning Algorithm stopped due to false convergence), then perhaps that warning message Algorithm stopped due to false convergence could be formulated differently?

Thanks in advance!

melff commented 1 year ago

Thanks for the replication material. It allowed me to test whether the recent revisions work.
Your code

mb_fit <- mclogit::mblogit(
  formula = y ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2,
  data = dat,
  random = ~ xco.1 | z.1,
  weights = w_obs,
  model = FALSE,
  y = FALSE
)
mb_fit

results in

mclogit::mblogit(formula = y ~ xco.1 + xco.2 + xco.3 + xca.1 + 
    xca.2, data = dat, random = ~xco.1 | z.1, weights = w_obs, 
    model = FALSE, y = FALSE)

Coefficients:
                   Predictors
Response categories  (Intercept)  xco.1     xco.2     xco.3     xca.1lvl2  xca.1lvl3  xca.2lvl2
              y2/y1  -3.39830      1.25987  -0.27576   3.85874   1.68203   -9.16883   -6.88392 
              y3/y1  -0.71716      0.02455   0.31685   0.66200   0.67266   -0.51303   -0.55805 

(Co-)Variances:
Grouping level: z.1 
          y2~1      y3~1      y2~xco.1  y3~xco.1
y2~1      141.1776                              
y3~1        2.2635    5.9994                    
y2~xco.1  -15.6402  -11.7495  111.0851          
y3~xco.1   -0.2452   -0.2933    2.0815    3.4029

Null Deviance:     90.09 
Residual Deviance: 43.41

Note: Algorithm did not converge.

and

summary(mb_fit)

gives

Call:
mclogit::mblogit(formula = y ~ xco.1 + xco.2 + xco.3 + xca.1 + 
    xca.2, data = dat, random = ~xco.1 | z.1, weights = w_obs, 
    model = FALSE, y = FALSE)

Equation for y2 vs y1:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -3.3983     7.7883  -0.436   0.6626  
xco.1         1.2599     5.3750   0.234   0.8147  
xco.2        -0.2758     1.3850  -0.199   0.8422  
xco.3         3.8587     2.2415   1.722   0.0852 .
xca.1lvl2     1.6820     9.7339   0.173   0.8628  
xca.1lvl3    -9.1688   586.6832  -0.016   0.9875  
xca.2lvl2    -6.8839     4.3594  -1.579   0.1143  

Equation for y3 vs y1:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.71716    1.44264  -0.497    0.619
xco.1        0.02455    0.93892   0.026    0.979
xco.2        0.31685    0.42296   0.749    0.454
xco.3        0.66200    0.56547   1.171    0.242
xca.1lvl2    0.67266    1.71939   0.391    0.696
xca.1lvl3   -0.51303    1.78021  -0.288    0.773
xca.2lvl2   -0.55805    1.59196  -0.351    0.726

(Co-)Variances:
Grouping level: z.1 
         Estimate                              Std.Err.                       
y2~1     141.1776                              151.080                        
y3~1       2.2635   5.9994                      43.480  19.578                
y2~xco.1 -15.6402 -11.7495 111.0851            117.552  52.455 175.070        
y3~xco.1  -0.2452  -0.2933   2.0815   3.4029    27.315   7.929  26.910   6.487

Null Deviance:     90.09 
Residual Deviance: 43.41 
Number of Fisher Scoring iterations:  16
Number of observations
  Groups by z.1: 6
  Individual observations:  41
Note: Algorithm did not converge.

The large estimates and even larger standard errors suggest that the data shows separation in the dependent variable. I should add that with 41 observations overall and 6 groups it would be very surprising to get stable results. With model that has 22 parameters you are asking quite a lot from data with just 41 observations.

fweber144 commented 1 year ago

Thanks for your reply.

I should have mentioned that this is not a real-world example, but rather simulated data from projpred's unit tests (I am planning to use the mclogit package within projpred). It might well be possible that the simulated data is too extreme so that separability occurs.

What I was unsure about was rather the potential contradiction between the warning Algorithm stopped due to false convergence and the printed output converged directly above the warnings. But if I understand your comment here correctly, then the printed output converged may be due to the value of the objective function not changing much anymore and the warning Algorithm stopped due to false convergence then indicates that the parameter estimates are still changing nonetheless (going to +/- infinity). Did I get that correctly?