melff / mclogit

mclogit: Multinomial Logit Models, with or without Random Effects or Overdispersion
http://melff.github.io/mclogit/
22 stars 4 forks source link

Problem in a model with two nested variables #6

Closed gustavobrp closed 3 years ago

gustavobrp commented 4 years ago

Hello Prof. Martin Elff,

I'm trying to fit a multinomial logit model with two random parameters, since my data set has individuals that are enrolled in specific institutions and programs within institutions. The mblogitfunction allow to do that? If so, how can I specify the formula? I tried to follow the logic from the lme4package, but I get the following results:

# A tibble: 10 x 7
         id_ institution sex   age_group      secondary_school dropout  program_id
       <dbl> <fct>       <fct> <fct>          <fct>            <fct>    <fct>     
 1 1.98e-312 3849        M     19 to 24 years Public school    Dropout  384918424 
 2 1.98e-312 17          M     19 to 24 years Public school    Enrolled 171439    
 3 1.98e-312 600         M     Above 30 years Public school    Enrolled 6001103914
 4 1.98e-312 694         F     Up to 18       Private school   Enrolled 69415839  
 5 1.98e-312 573         F     19 to 24 years Public school    Stopout 573116884 
 6 1.98e-312 694         M     Above 30 years Public school    Enrolled 69452141  
 7 1.98e-312 5           M     Up to 18       Private school   Enrolled 5300518   
 8 1.98e-312 601         M     19 to 24 years Public school    Dropout  6011327404
 9 1.98e-312 2564        F     19 to 24 years Private school   Enrolled 2564118910
10 1.98e-312 578         F     19 to 24 years Public school    Enrolled 57813288
mblogit(formula = dropout ~ sex + age_group + secondary_school,
random = ~ 1 | institution/program_id,
data = stud.db)

Iteration 1 - Deviance = 79497.12
Iteration 2 - Deviance = 75292.89
Iteration 3 - Deviance = 74943.24
Iteration 4 - Deviance = 74931.05
Iteration 5 - Deviance = 74931
Iteration 6 - Deviance = 74931
converged

Error in Z[[k]] : subscript out of bounds

The class_id variables is a index that combines the id from the institution and the id from the program within the institution (because some programs appears in more than one institution).

Thanks a lot.

Best,

melff commented 4 years ago

Yep, that's a bug, fixed in release 0.7.2. See https://github.com/melff/mclogit/releases/tag/0.7.2

gustavobrp commented 4 years ago

Hello Prof. Elff,

Just a follow up. I estimated again the model and now this problem doesn't occur anymore, but I receive the following messages while estimating a complete model. This is probably related to my data set and you can ignore, but I thought that maybe would be important to send a feedback.

>mblogit(formula = dropout ~ sex + age_group + secondary_school + fam_income + color_race +
+ receive_grant + extracurricular,
+random = ~ 1 | institution/program_id,
+data = stud.db)

Iteration 1 - Deviance = 157998.1
Iteration 2 - Deviance = 147780.8
Iteration 3 - Deviance = 146485.8
Iteration 4 - Deviance = 146400.7
Iteration 5 - Deviance = 146399.1
Iteration 6 - Deviance = 146399.1
Iteration 7 - Deviance = 146399.1
converged
Iteration 1 - deviance = 150283.3 - criterion = 0.1145848
    Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  Inf 
  Stepsize halved - new deviance =  193007 
  Stepsize halved - new deviance =  183789.1 
  Stepsize halved - new deviance =  179251.9 
  Stepsize halved - new deviance =  177098.2 
  Stepsize halved - new deviance =  176032 
  Stepsize halved - new deviance =  175514.9 
  Stepsize halved - new deviance =  175640.2 
  Stepsize halved - new deviance =  175837 
  Stepsize halved - new deviance =  175936.8 
  Stepsize halved - new deviance =  175986.8 
  Stepsize halved - new deviance =  176011.7 
  Stepsize halved - new deviance =  176024.2 
  Stepsize halved - new deviance =  176030.5 
  Stepsize halved - new deviance =  176033.6 
  Stepsize halved - new deviance =  176035.2 
  Stepsize halved - new deviance =  176035.9 
Iteration 2 - deviance = 176035.9 - criterion = 2.816581e-09
converged

Warning messages:
1: step size truncated due to possible divergence 
2: Algorithm stopped due to false convergence 

Again, thanks a lot for all the help.

melff commented 4 years ago

Hi,

It appears that the PQL approach leads to numerical difficulties if the groups or clusters are small. In release 0.8 there will be an MQL approach available, which may behave better in such situation - even though estimates will be shrunk towards zero.

Am Dienstag, den 19.05.2020, 17:21 -0700 schrieb gustavobrp:

Hello Prof. Elff, Just a follow up. I estimated again the model and now this problem doesn't occur anymore, but I receive the following messages while estimating a complete model. This is probably related to my data set and you can ignore, but I thought that maybe would be important to send a feedback.

mblogit(formula = dropout ~ sex + age_group + secondary_school + fam_income + color_race +

  • receive_grant + extracurricular, +random = ~ 1 | institution/program_id, +data = stud.db)

Iteration 1 - Deviance = 157998.1 Iteration 2 - Deviance = 147780.8 Iteration 3 - Deviance = 146485.8 Iteration 4 - Deviance = 146400.7 Iteration 5 - Deviance = 146399.1 Iteration 6 - Deviance = 146399.1 Iteration 7 - Deviance = 146399.1 converged Iteration 1 - deviance = 150283.3 - criterion = 0.1145848 Stepsize halved - new deviance = Inf Stepsize halved - new deviance = Inf Stepsize halved - new deviance = Inf Stepsize halved - new deviance = Inf Stepsize halved - new deviance = 193007 Stepsize halved - new deviance = 183789.1 Stepsize halved - new deviance = 179251.9 Stepsize halved - new deviance = 177098.2 Stepsize halved - new deviance = 176032 Stepsize halved - new deviance = 175514.9 Stepsize halved - new deviance = 175640.2 Stepsize halved - new deviance = 175837 Stepsize halved - new deviance = 175936.8 Stepsize halved - new deviance = 175986.8 Stepsize halved - new deviance = 176011.7 Stepsize halved - new deviance = 176024.2 Stepsize halved - new deviance = 176030.5 Stepsize halved - new deviance = 176033.6 Stepsize halved - new deviance = 176035.2 Stepsize halved - new deviance = 176035.9 Iteration 2 - deviance = 176035.9 - criterion = 2.816581e-09 converged

Warning messages: 1: step size truncated due to possible divergence 2: Algorithm stopped due to false convergence Again, thanks a lot for all the help. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

melff commented 4 years ago

Hi,

4 days ago I put up the 0.8 release. The algorithm should be more stable now. Let me know if you still encounter convergence problems. If all else fails, 0.8.2 supports estimation of an overdispersion parameter, which allows for a (rather crude) alternative of dealing with clustered data.

gustavobrp commented 4 years ago

Hello Martin,

Thanks for let me know about the release.

Tried again and now it converged. Just to report some results, when I run the model with limited variables, it produces NaN in the grouping level 2.

>mblogit(formula = dropout ~ sex + age_group,
+random = ~ 1 | institution/program_id,
+data = stud.db)

Iteration 1 - deviance = 166003 - criterion = 0.8051472
Iteration 2 - deviance = 145565 - criterion = 0.05297786
Iteration 3 - deviance = 139727.4 - criterion = 0.02102629
Iteration 4 - deviance = 137573.3 - criterion = 0.008212973
Iteration 5 - deviance = 136641.2 - criterion = 0.002999969
Iteration 6 - deviance = 136343.1 - criterion = 0.000300909
Iteration 7 - deviance = 136278.7 - criterion = 1.705491e-05
Iteration 8 - deviance = 136251.1 - criterion = 5.813921e-07
Iteration 9 - deviance = 136246 - criterion = 1.285311e-08
Iteration 10 - deviance = 136245.3 - criterion = 3.098197e-10
converged

(Co-)Variances:
Grouping level: 1 
                                 Estimate      Std.Err.     
Dropout~1 1.494          4.758       
Stopout~1      2.143 3.948    8.221 14.188
Grouping level: 2 
                                 Estimate      Std.Err.
Dropout~1 2.206         NaN     
Stopout~1      1.447 3.139   NaN NaN 

Null Deviance:     482500 
Residual Deviance: 136200 
Number of Fisher Scoring iterations:  10 
Number of observations:  219597 

Warning message:
In sqrt(diag(vcov.phi)) : NaNs produced

And with the complete model, it doesn't produce NaNs.

>mblogit(formula = dropout ~ sex + age_group + secondary_school + fam_income + color_race +
+ receive_grant + extracurricular,
+random = ~ 1 | institution/program_id,
+data = stud.db)

Iteration 1 - deviance = 165108.2 - criterion = 0.8049309
Iteration 2 - deviance = 143677.5 - criterion = 0.05636877
Iteration 3 - deviance = 137233.2 - criterion = 0.0243093
Iteration 4 - deviance = 134822 - criterion = 0.01024263
Iteration 5 - deviance = 133809.7 - criterion = 0.003024441
Iteration 6 - deviance = 133378.3 - criterion = 0.0007326398
Iteration 7 - deviance = 133226.5 - criterion = 7.890494e-05
Iteration 8 - deviance = 133181.8 - criterion = 4.898939e-06
Iteration 9 - deviance = 133171.1 - criterion = 3.000677e-07
Iteration 10 - deviance = 133168.8 - criterion = 2.003485e-08
Iteration 11 - deviance = 133168.3 - criterion = 1.462663e-09
converged

(Co-)Variances:
Grouping level: 1 
                                 Estimate      Std.Err.     
Stopout~1 1.638          3.661       
Dropout~1      2.356 4.344    6.334 10.948
Grouping level: 2 
                                 Estimate      Std.Err.     
Stopout~1 2.871         0.2844       
Dropout~1      2.288 5.060   0.2637 0.3159

Null Deviance:     482500 
Residual Deviance: 133200 
Number of Fisher Scoring iterations:  11 
Number of observations:  219597 

A very newby question: how the estimation is determined by the package and it is possible to change these parameters?

Again, thank you very much.

melff commented 4 years ago

Am Mittwoch, den 27.05.2020, 14:00 -0700 schrieb gustavobrp:

A very newby question: how the estimation is determined by the package and it is possible to change these parameters?

You can switch between the PQL and the MQL technique of approximate ML by using the 'method=' argument in a call of 'mclogit()' if that is what you mean ...

gustavobrp commented 4 years ago

Understood! Thanks!

gustavobrp commented 4 years ago

Hello Martin,

In the newest version of the package, I fitted the first model that I posted here, without any modifications, and I'm getting this message:

Error in res[[i, j]] <- t(x[[i, j]]) : [[ ]] subscript out of bounds

I modified the method and estimator arguments to other options and used more or less variables, but I receive the same message.

melff commented 4 years ago

This should be fixed by release 0.8.6.1

gustavobrp commented 4 years ago

Hi Martin, thanks for the return!

Tested again and now I get this message:

Error in x[[j, i]] : subscript out of bounds

Followed the same tests as before. This happens with a model with two random effects. When I exclude one of them, It works.

melff commented 4 years ago

Dang, I thought I got it. Can you please show what traceback() shows if immediately called after the error message has been shown? That would help me in bug-hunting.

gustavobrp commented 4 years ago

Sure! Like this?

Error in x[[j, i]] : subscript out of bounds
> traceback()
8: t(x[[j, i]])
7: bMatTrns(ZWX)
6: cbind(blockMatrix(XWX), bMatTrns(ZWX))
5: rbind(cbind(blockMatrix(XWX), bMatTrns(ZWX)), cbind(ZWX, ZWZiSigma))
4: structure(rbind(cbind(blockMatrix(XWX), bMatTrns(ZWX)), cbind(ZWX, 
       ZWZiSigma)), class = "blockMatrix")
3: PQLMQL_innerFit(y.star, X, Z, W, d, groups, offset, method, estimator, 
       control)
2: mmclogit.fitPQLMQL(y = Y, s = s, w = weights, X = XD, Z = ZD, 
       groups = groups, method = method, estimator = estimator, 
       control = control, offset = offset)
1: mblogit(formula = ind_EVASAO_TER ~ dep_SEXO + dep_IDADE_FAIXA + 
       dep_ORIGEM_ESCOLAR, random = ~1 | COD_IES_2016/COD_IES_CURSO_2016, 
       data = bd.subset)
melff commented 4 years ago

I think I identified the bug, which is now fixed in the current release: 0.8.6.3.

gustavobrp commented 4 years ago

Just tested again and it's working! Thanks a lot.