Algorithm did not converge

mjohnston118 commented 4 years ago

Hi Prof. Martin Elff,

I am trying to use your mblogit function to fit a multinomial logistic regression to examine the factors which predict participants in my experiment selecting one of three answers. Because my model is mixed (I am comparing different age groups of participants and also participants' performance on one of two trial types) I am including subject ID as a random factor.

For model selection, my approach is to start by fitting a full model (3 IVs with all their interactions) then removing components and comparing the BIC score of the model before and after removing each component. If removing a component dramatically reduces the BIC score (based on Raftery, 1995) then I will argue it does not add predictive value to the model and that component will not be included in the final model.

My problem here is that with the full model R studio returns the error "Algorithm does not converge". I increased the number of iterations to 200 but it returns the following issue:

> study1_fullModel <- mblogit(Answer ~ Age_group:Relief:Story, 
                                data = long_data2,
                                maxit = 200,
                                method = "PQL",
                                trace = T,
                                random = ~1|ID)
Iteration 1 - deviance = 866.3455 - criterion = 0.6663194
Iteration 2 - deviance = 666.6792 - criterion = 0.07784625
Iteration 3 - deviance = 538.5546 - criterion = 0.04248887
Iteration 4 - deviance = 562.2433 - criterion = 0.02415636
Iteration 5 - deviance = 584.9369 - criterion = 0.02079293
Iteration 6 - deviance = 592.4505 - criterion = 0.0183108
Iteration 7 - deviance = 585.5645 - criterion = 0.01538235
Iteration 8 - deviance = 569.38 - criterion = 0.01273065
Iteration 9 - deviance = 548.7899 - criterion = 0.01054194
Iteration 10 - deviance = 527.194 - criterion = 0.008786208
Iteration 11 - deviance = 506.3881 - criterion = 0.007371137
Iteration 12 - deviance = 487.2235 - criterion = 0.006213371
Iteration 13 - deviance = 470.1385 - criterion = 0.005256752
Iteration 14 - deviance = 455.3552 - criterion = 0.004465433
Iteration 15 - deviance = 442.9199 - criterion = 0.003814404
Iteration 16 - deviance = 432.7273 - criterion = 0.003283286
Iteration 17 - deviance = 424.5586 - criterion = 0.002853174
Iteration 18 - deviance = 418.1291 - criterion = 0.002505815
Iteration 19 - deviance = 413.1344 - criterion = 0.0022243
Iteration 20 - deviance = 409.2851 - criterion = 0.00199405
Iteration 21 - deviance = 406.3287 - criterion = 0.001803303
Iteration 22 - deviance = 404.0569 - criterion = 0.001643018
Iteration 23 - deviance = 402.3053 - criterion = 0.001506427
Iteration 24 - deviance = 400.9477 - criterion = 0.001388527
Iteration 25 - deviance = 399.8887 - criterion = 0.001285607
Iteration 26 - deviance = 399.057 - criterion = 0.001194895
Iteration 27 - deviance = 398.3997 - criterion = 0.001114286
Iteration 28 - deviance = 397.8769 - criterion = 0.001042155
Iteration 29 - deviance = 397.459 - criterion = 0.0009772273
Iteration 30 - deviance = 397.1233 - criterion = 0.0009184833
Iteration 31 - deviance = 396.8526 - criterion = 0.0008650971
Iteration 32 - deviance = 396.6336 - criterion = 0.0008163892
Iteration 33 - deviance = 396.456 - criterion = 0.0007717943
Iteration 34 - deviance = 396.3116 - criterion = 0.000730837
Iteration 35 - deviance = 396.1941 - criterion = 0.0006931135
Iteration 36 - deviance = 396.0983 - criterion = 0.0006582784
Iteration 37 - deviance = 396.0201 - criterion = 0.0006260338
Iteration 38 - deviance = 395.9562 - criterion = 0.0005961216
Iteration 39 - deviance = 395.904 - criterion = 0.0005683163
Iteration 40 - deviance = 395.8613 - criterion = 0.0005424203
Iteration 41 - deviance = 395.8264 - criterion = 0.0005182595
Iteration 42 - deviance = 395.7978 - criterion = 0.0004956799
Iteration 43 - deviance = 395.7744 - criterion = 0.0004745448
Iteration 44 - deviance = 395.7553 - criterion = 0.0004547322
Iteration 45 - deviance = 395.7396 - criterion = 0.0004361331
Iteration 46 - deviance = 395.7267 - criterion = 0.0004186497
Iteration 47 - deviance = 395.7162 - criterion = 0.0004021942
Iteration 48 - deviance = 395.7076 - criterion = 0.0003866872
Iteration 49 - deviance = 395.7005 - criterion = 0.0003720569
Iteration 50 - deviance = 395.6948 - criterion = 0.0003582385
Iteration 51 - deviance = 395.69 - criterion = 0.0003451728
Iteration 52 - deviance = 395.6861 - criterion = 0.0003328063
Iteration 53 - deviance = 395.683 - criterion = 0.0003210898
Iteration 54 - deviance = 395.6804 - criterion = 0.0003099788
Iteration 55 - deviance = 395.6782 - criterion = 0.0002994322
Iteration 56 - deviance = 395.6765 - criterion = 0.0002894125
Iteration 57 - deviance = 395.675 - criterion = 0.0002798853
Iteration 58 - deviance = 395.6739 - criterion = 0.0002708189
Iteration 59 - deviance = 395.6729 - criterion = 0.0002621842
Iteration 60 - deviance = 395.6721 - criterion = 0.0002539542
Iteration 61 - deviance = 395.6715 - criterion = 0.0002461042
Iteration 62 - deviance = 395.671 - criterion = 0.0002386112
Iteration 63 - deviance = 395.6705 - criterion = 0.0002314622
Iteration 64 - deviance = 395.6701 - criterion = 0.0002246121
Iteration 65 - deviance = 395.6698 - criterion = 0.0002180685
Iteration 66 - deviance = 395.6696 - criterion = 0.0002118057
Iteration 67 - deviance = 395.6694 - criterion = 0.0002058078
Iteration 68 - deviance = 395.6693 - criterion = 0.0002000602
Iteration 69 - deviance = 395.6691 - criterion = 0.0001945493
Iteration 70 - deviance = 395.669 - criterion = 0.0001892621
Iteration 71 - deviance = 395.6689 - criterion = 0.0001841869
Iteration 72 - deviance = 395.6689 - criterion = 0.0001793124
Iteration 73 - deviance = 395.6688 - criterion = 0.0001746283
Iteration 74 - deviance = 395.6688 - criterion = 0.0001701248
Iteration 75 - deviance = 395.6687 - criterion = 0.0001657927
Iteration 76 - deviance = 395.6687 - criterion = 0.0001616236
Iteration 77 - deviance = 395.6687 - criterion = 0.0001576092
Iteration 78 - deviance = 395.6686 - criterion = 0.0001537422
Iteration 79 - deviance = 395.6686 - criterion = 0.0001500154
Iteration 80 - deviance = 395.6686 - criterion = 0.0001464221
Iteration 81 - deviance = 395.6686 - criterion = 0.000142956
Iteration 82 - deviance = 395.6686 - criterion = 0.0001396112
Iteration 83 - deviance = 395.6686 - criterion = 0.0001363822
Iteration 84 - deviance = 395.6686 - criterion = 0.0001332636
Iteration 85 - deviance = 477.5151 - criterion = 0.001599568
Iteration 86 - deviance = 545.524 - criterion = 0.0001433904
Iteration 87 - deviance = 541.6072 - criterion = 0.0001688429
Iteration 88 - deviance = 531.2352 - criterion = 0.0001711763
Iteration 89 - deviance = 407.89 - criterion = 0.0005527749
Iteration 90 - deviance = 409.2737 - criterion = 0.0001380598
  step size truncated due to possible divergence  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  11897.35 
Iteration 91 - deviance = 11897.35 - criterion = 0.7397263
Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1)

Traceback() returns:

> traceback()
7: qr.default(x)
6: qr(x)
5: qr(x)
4: matrank(S.k)
3: PQLMQL_innerFit(y.star, X, Z, W, d, groups, offset, method, estimator, 
       control)
2: mmclogit.fitPQLMQL(y = Y, s = s, w = weights, X = XD, Z = ZD, 
       groups = groups, method = method, estimator = estimator, 
       control = control, offset = offset)
1: mblogit(Answer ~ Age_group:Relief:Story, data = long_data2, maxit = 200, 
       method = "PQL", trace = T, random = ~1 | ID)

I then tried changing method to "MQL" but it gets to iteration 99 and returns this error: Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1)

Taking a few of the interactions out does remove this problem of convergence but if possible would like to get an estimate of the full model and all its interactions so I can report the BIC of this full model.

Do you know what I can do here to get the full model to run?

Thanks in advance! Matthew

melff commented 4 years ago

The iteration history suggests that your data have a flat (approximate) log-likelihood surface. This can be so because either you have too few clusters or to small clusters (I spuspect the latter).

Maybe you try setting maxit=83. You may get a warning about non-convergence, but at least "preliminary" estimates.

mjohnston118 commented 4 years ago

Hi Martin,

Thanks for the response. I have a few questions:

Sorry but what does a "flat (approximate) log-likelihood surface" mean exactly?
Maxit = 83 works as you say with the following error: "Algorithm did not convergeFitted probabilities numerically 0 occurred". I then use BIC(study1_fullModel) to obtain the BIC value. Can this value be trusted?
Where did "83" come from? When I remove the three way interaction to get the BIC value of the reduced model I encounter the convergence problem again so it would be useful if I could solve this solution myself in the future

I'm quite new to this type of analysis so apologies if these questions are basic.

Regards, Matthew

melff commented 4 years ago

A log-likelihood function is flat if it does not have a maximum. This may occur e.g. when there is separation in logitistic regression. As a consequence, no maximum likelihood estimate exists. I write "(approximate)" because the (log-)likelihood function of logit models with random effects does not have a closed form.

"Algorithm did not converge Fitted probabilities numerically 0 occurred" is a warning and not an error. Nevertheless I would not trust BICs here, because the log-likehood function is only a rough approximation.

I suggested to set 'maxit=83' because this is the last iteration before the algorithm diverges. You may at least have finite (yet not very trustworthy) estimates then.

It seems that indeed you do not have enough data to be able to estimate the random effects variance reliably. A quick fix would be to try fitting a multinomial model without random effects but with overdisperision. There may also a very slight chance that trying mblogit(...,estimator="REML") may help.

melff / mclogit

Algorithm did not converge #10