Missmatch of clustered standard errors between logitr and Stata clogit

jhelvy / logitr

Fast estimation of multinomial (MNL) and mixed logit (MXL) models in R with "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations in R

https://jhelvy.github.io/logitr/

Other

43 stars 15 forks source link

Missmatch of clustered standard errors between logitr and Stata clogit #26

Closed Ales-G closed 2 years ago

Ales-G commented 2 years ago

Hello, first of all, thank you very much for developing this package it is really amazing and of great help.

I have a question regarding the clustering of standard errors. I am comparing the results of a multinomial logit model in preference space with those that I get in Stata and something appears to go wrong once I cluster standard errors. If I run a basic mnl model with yogurt data and cluster standard errors by respondent these appear to explode (all p values>.9)

mnl_pref_yog <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  pars    = c("price", "feat", "brand"),
  clusterID = "id"
)
summary(mnl_pref_yog)

Model Coefficients: 
             Estimate Std. Error z-value Pr(>|z|)
price        -0.36655   10.95012 -0.0335   0.9733
feat          0.49144   10.77771  0.0456   0.9636
brandhiland  -3.71548   30.48871 -0.1219   0.9030
brandweight  -0.64114    2.51804 -0.2546   0.7990
brandyoplait  0.73452   28.27514  0.0260   0.9793

However if run the same model in Stata, using the clogit command the effect of clustering standard errors is less extreme.

use "yogurt.dta", clear
encode brand, gen (en_brand)
clogit choice price feat i.en_brand, group(obsID) robust cluster(id)

Conditional (fixed-effects) logistic regression         Number of obs =  9,648
                                                        Wald chi2(5)  = 181.89
                                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -2656.8879                       Pseudo R2     = 0.2054

                                   (Std. err. adjusted for 100 clusters in id)
------------------------------------------------------------------------------
             |               Robust
      choice | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |  -.3665845   .0542258    -6.76   0.000    -.4728651   -.2603038
        feat |   .4914334   .1938909     2.53   0.011     .1114143    .8714525
             |
    en_brand |
     hiland  |  -3.715598   .3536449   -10.51   0.000    -4.408729   -3.022466
     weight  |  -.6411843   .4472309    -1.43   0.152    -1.517741    .2353722
    yoplait  |   .7345712   .2772028     2.65   0.008     .1912636    1.277879
------------------------------------------------------------------------------

Note coefficients are the same, and that when I do not cluster standard errors, these are almost identical regardless of whether I used logitr or clogit.

Is there something going on with the clustering or I am misspecifying something?

thanks a lot in advance for clarifying this!

Your work is truly helpful!

jhelvy commented 2 years ago

Thanks @Ales-G! Yes, I think there's an error in the computation of the clustered errors in logitr. I'm looking into the source of the problem, but this is really helpful as it gives me an example to compare against. I'll follow up as I search, but for now I would use the Stata results.

jhelvy commented 2 years ago

Alright, I've narrowed the problem down. It has to do with scaling the data. By default the scaleInputs argument is set to TRUE (this helps make the optimization more stable). But somewhere in one of my updates I introduced an error where the errors are not correctly re-scaled post-estimation. So if you just set scaleInputs = FALSE you should get the correct standard errors:

mnl_pref_yog <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  pars    = c("price", "feat", "brand"),
  clusterID = "id",
  scaleInputs = FALSE
)
coef(summary(mnl_pref_yog))

               Estimate Std. Error    z-value     Pr(>|z|)
price        -0.3666739 0.05423654  -6.760642 1.373812e-11
feat          0.4916524 0.19390366   2.535550 1.122709e-02
brandhiland  -3.7156066 0.35365213 -10.506388 0.000000e+00
brandweight  -0.6411729 0.44725879  -1.433561 1.516975e-01
brandyoplait  0.7349638 0.27721575   2.651234 8.019827e-03

I'm working on a patch to fix this.

jhelvy commented 2 years ago

Okay, I think I fixed this. If you re-install the package from github and try the same model you should get the same results:

Install the dev version of the package (0.5.1):

# install.packages("remotes")
remotes::install_github("jhelvy/logitr")

Estimate the model

mnl_pref_yog <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  pars    = c("price", "feat", "brand"),
  clusterID = "id"
)

coef(summary(mnl_pref_yog))

      Estimate Std. Error    z-value     Pr(>|z|)
price        -0.3665546 0.05422209  -6.760244 1.377587e-11
feat          0.4914392 0.19388663   2.534673 1.125523e-02
brandhiland  -3.7154773 0.35363731 -10.506463 0.000000e+00
brandweight  -0.6411384 0.44722326  -1.433598 1.516870e-01
brandyoplait  0.7345195 0.27720045   2.649777 8.054482e-03

Ales-G commented 2 years ago

Dear Jhelvi, thanks a lot for looking into this, indeed the clustering now works using the multinomial logit. However, I have tried running a mixed logit and the standard errors still appear to be implausibly large. Is it possible that there is still a problem with clustering in the mixed logit?

mxl_pref_yog <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  pars    = c("price", "feat", "brand"),
  randPars = c(price="ln",feat="n",brand="n"),
  clusterID = "id"
)
summary(mxl_pref_yog)
Model Coefficients: 
                      Estimate  Std. Error z-value Pr(>|z|)
price_mu           -7.2574e-01  1.6353e+04  0.0000   1.0000
feat_mu             1.1882e+00  3.4152e+01  0.0348   0.9722
brandhiland_mu     -3.1898e+00  5.6172e+01 -0.0568   0.9547
brandweight_mu     -3.2192e+00  8.7247e+01 -0.0369   0.9706
brandyoplait_mu    -1.1540e-01  9.0434e+00 -0.0128   0.9898
price_sigma         6.9807e-03  1.8764e+03  0.0000   1.0000
feat_sigma         -3.4040e+00  7.5582e+01 -0.0450   0.9641
brandhiland_sigma  -7.0209e-01  7.0273e+01 -0.0100   0.9920
brandweight_sigma   5.2986e+00  1.3624e+02  0.0389   0.9690
brandyoplait_sigma  8.0490e-01  2.3802e+02  0.0034   0.9973

jhelvy commented 2 years ago

Oh no! I thought I had that fixed too...I'll go check some more.

jhelvy commented 2 years ago

Okay, I looked at things again and I believe the code is all correct still.

For this specific example, the errors are high because the algorithm has reached a local minimum. I can tell because the log-likelihood is ~2787. This often happens when using log-normally distributed parameters. If you, for example, change the price parameter to a normal distribution, I get a solution where the log-likelihood is ~2640. Here's what I get:

mxl_pref_yog <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  pars    = c("price", "feat", "brand"),
  randPars = c(price="n", feat="n", brand="n"),
  clusterID = "id",
  numMultiStarts = 10
)

coef(summary(mxl_pref_yog))

                     Estimate Std. Error    z-value     Pr(>|z|)
price_mu           -0.4674854  0.1357989 -3.4424816 0.0005764031
feat_mu             0.6522535  0.3156426  2.0664302 0.0387878780
brandhiland_mu     -4.3428303  1.0188227 -4.2625967 0.0000202065
brandweight_mu     -1.6754820  1.1016610 -1.5208690 0.1282927094
brandyoplait_mu     0.9453728  0.3261557  2.8985321 0.0037491389
price_sigma        -0.1311470  0.2213794 -0.5924084 0.5535771241
feat_sigma          2.4859074  0.6583294  3.7760843 0.0001593131
brandhiland_sigma  -0.1492186  0.4145413 -0.3599607 0.7188765041
brandweight_sigma  -2.2336045  1.2324079 -1.8123906 0.0699258585
brandyoplait_sigma  0.1550451  0.1818241  0.8527203 0.3938144317

Notice too that I added a numMultiStarts = 10 argument. This will run the algorithm 10 times from 10 sets of random starting points. This helps avoid local minimum solutions. When using log-normal parameters, sometimes I have to search for 100 multistarts or more to avoid local minima. If a better solution isn't being reached by then, chances are the chosen distribution just isn't a good fit for the data.

Ales-G commented 2 years ago

Dear Helvy, thanks a lot for clarifying this, this makes sense. I thought the problem was still related to clustering but obviously, it is not the case! Just as a suggestion, what values of the log likelihood should ring a bell and indicated that probably I have reached a local minimum?

Thanks again for solving this so quickly!

jhelvy commented 2 years ago

No problem, and thanks for catching this!

If you don't mind, could you run some similar models in Stata to double check the solutions? I don't have a license so I can't make the comparison. I believe there are mixed logit models for Stata. Note that for now the mixed logit models in logitr have uncorrelated heterogeneity, so it assumes a diagonal covariance matrix for the mixed terms (off-diagonals are 0). I working on supporting correlation.

Ales-G commented 2 years ago

So, here are the results from logitr using a log normal distribution for the price parameter.

mxl_pref_yog_ln <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  pars    = c("price", "feat", "brand"),
  randPars = c(price="ln",feat="n",brand="n"),
  modelSpace     = "pref",
  clusterID = "id",
  numMultiStarts = 10
)
summary(mxl_pref_yog_ln)
Model Coefficients: 
                     Estimate Std. Error z-value Pr(>|z|)
price_mu            -0.523603 473.733223 -0.0011   0.9991
feat_mu              1.564994  29.462348  0.0531   0.9576
brandhiland_mu      -2.888755   4.148376 -0.6964   0.4862
brandweight_mu      -3.619802  46.385471 -0.0780   0.9378
brandyoplait_mu     -0.614410  73.998085 -0.0083   0.9934
price_sigma          0.020436 183.291541  0.0001   0.9999
feat_sigma           3.239503  42.335816  0.0765   0.9390
brandhiland_sigma    0.236571  88.789261  0.0027   0.9979
brandweight_sigma   -5.087253  58.426740 -0.0871   0.9306
brandyoplait_sigma  -1.316474 194.592629 -0.0068   0.9946

The results of estimating the same model in Stata appear to be different.

use "yogurt.dta", clear
encode brand, gen (en_brand)
mixlogit choice, rand(feat en_brand price) ln(1) nrep(10) group(obsID) id(id) cluster(id)

Mixed logit model                                       Number of obs =  9,648
                                                        Wald chi2(3)  =  69.60
Log likelihood = -2023.0267                             Prob > chi2   = 0.0000

                                   (Std. err. adjusted for 100 clusters in id)
------------------------------------------------------------------------------
             |               Robust
      choice | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Mean         |
        feat |   1.165762   .2269738     5.14   0.000     .7209012    1.610622
    en_brand |  -.0425926   .0856924    -0.50   0.619    -.2105465    .1253613
       price |  -2.807109   .5081851    -5.52   0.000    -3.803133   -1.811084
-------------+----------------------------------------------------------------
SD           |
        feat |   .0935602   .2438872     0.38   0.701    -.3844501    .5715704
    en_brand |     1.6778   .2386911     7.03   0.000     1.209974    2.145626
       price |  -1.864972   .3684935    -5.06   0.000    -2.587206   -1.142738
------------------------------------------------------------------------------
The sign of the estimated standard deviations is irrelevant: interpret them as
being positive

Note that I am not using the option corr in Stata that allows correlated heterogeneity. It appears that there are two issues: first the estimated coefficients do not coincide, second cluster standard errors are much narrower in Stata.

I have also tried estimating the two models with normally distributed parameters and without clustered standard errors. While the SEs problem appears to go away, estimated coefficients still appear to be very different.

*STATA
 mixlogit choice, rand(feat en_brand price) nrep(10) group(obsID) id(id)
Mixed logit model                                      Number of obs =   9,648
                                                       LR chi2(3)    = 2338.09
Log likelihood = -2054.4019                            Prob > chi2   =  0.0000

------------------------------------------------------------------------------
      choice | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Mean         |
        feat |   1.049358   .1505567     6.97   0.000     .7542725    1.344444
    en_brand |   .1127672   .0417572     2.70   0.007     .0309246    .1946098
       price |   .0691454   .0236047     2.93   0.003      .022881    .1154097
-------------+----------------------------------------------------------------
SD           |
        feat |  -.1293079   .3007661    -0.43   0.667    -.7187987    .4601829
    en_brand |   1.296878   .0485107    26.73   0.000     1.201799    1.391957
       price |   .4603082   .0293819    15.67   0.000     .4027208    .5178956
------------------------------------------------------------------------------
The sign of the estimated standard deviations is irrelevant: interpret them as
being positive

# R

mxl_pref_yog <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  pars    = c("price", "feat", "brand"),
  randPars = c(price="n",feat="n",brand="n"),
  modelSpace     = "pref",
  numMultiStarts = 10
)
summary(mxl_pref_yog)

Model Coefficients: 
                    Estimate Std. Error  z-value  Pr(>|z|)    
price_mu           -0.468283   0.048805  -9.5949 < 2.2e-16 ***
feat_mu             0.650771   0.232704   2.7966 0.0051650 ** 
brandhiland_mu     -4.356656   0.403941 -10.7854 < 2.2e-16 ***
brandweight_mu     -1.678958   0.456975  -3.6741 0.0002387 ***
brandyoplait_mu     0.947794   0.143627   6.5990 4.140e-11 ***
price_sigma        -0.131491   0.091383  -1.4389 0.1501782    
feat_sigma          2.482720   0.536940   4.6238 3.767e-06 ***
brandhiland_sigma  -0.186607   0.859620  -0.2171 0.8281458    
brandweight_sigma  -2.239907   0.692509  -3.2345 0.0012186 ** 
brandyoplait_sigma  0.159572   0.431990   0.3694 0.7118383

Am I doing something wrong?

In any case, your package is much quicker than Stata which is great.

jhelvy commented 2 years ago

Okay, thanks for these comparisons - this is super helpful! But I'm not sure if these are apples-to-apples comparisons.

First, what is the en_brand parameter in the Stata models? There are 4 brands, so there should be 3 mean and 3 SD parameters - one for each brand, with one brand set as the reference level. In the logitr models, you see these coefficients for each brand, with dannon set as the reference level.

Second, I forgot to mention that since this data set has a panel structure, so we should have been including panelID = 'id' in every model. (I actually need to update my documentation for this as the examples I show should include this). I'm not sure if the panel structure is being included in the Stata models you ran?

Finally, when you run these models, can you include the log-likelihood values too in the results? That's a good quick check to see if the differences are just due to the models reaching different solutions, which is certainly possible. In general I think Stata reaches a consistent solution more frequently than logitr, but that's why I built in a numMultiStarts feature. It's also why I focused on optimizing the package for speed so that you can estimate 100 or so models pretty quickly. This is especially important for WTP space models which tend to diverge more often.

Ales-G commented 2 years ago

First of all thanks for pointing out the fact about the panelID, this is indeed very helpful in using logitr. In Stata I was specifying that the dataset was a panel using the id(id) option. Second apologies for the estimation in Stata before, as you correctly spotted, I was not including the brand as a dummy. I run everything quickly and failed to spot the mistake.

This is the correct estimation in Stata: Note that mixlogit requires to multiply the price coefficient by -1 because "log-normally distributed coefficients should have positive coefficients in the conditional logit model". In the yogurt data, price is a cost, not an income, hence I should multiply it by -1. Stata produces an error otherwise.

use "yogurt.dta", clear
quietly tab brand, gen(brandum) //generate dummies
gen mprice=-price // generate the -price because coefficients are assumed to be positively distributed

mixlogit choice, rand(feat brandum2-brandum4 mprice) ln(1) nrep(100) group(obsID) id(id) cluster(id)

Mixed logit model                                       Number of obs =  9,648
                                                        Wald chi2(5)  = 268.31
Log likelihood = -1251.3074                             Prob > chi2   = 0.0000

                                   (Std. err. adjusted for 100 clusters in id)
------------------------------------------------------------------------------
             |               Robust
      choice | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Mean         |
        feat |   1.239227   .3501429     3.54   0.000     .5529592    1.925494
    brandum2 |  -5.395541   .5076889   -10.63   0.000    -6.390593   -4.400489
    brandum3 |  -1.407874   .4732163    -2.98   0.003    -2.335361   -.4803867
    brandum4 |   .0078784   .2534039     0.03   0.975    -.4887841    .5045408
      mprice |  -.9232874   .2344312    -3.94   0.000    -1.382764   -.4638106
-------------+----------------------------------------------------------------
SD           |
        feat |   1.418924   .4383523     3.24   0.001     .5597698    2.278079
    brandum2 |   2.234836   .2612979     8.55   0.000     1.722702    2.746971
    brandum3 |   6.909907   1.104808     6.25   0.000     4.744522    9.075291
    brandum4 |   3.314474    .303403    10.92   0.000     2.719815    3.909132
      mprice |   .4541621   .1691712     2.68   0.007     .1225927    .7857315
------------------------------------------------------------------------------

For reference, here is the estimation with logitr using 100 multistart and with panelID specified. It still appears that coefficients are different and the standard errors on price are unreasonably large.

mxl_pref_yog_ln <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  panelID = "id",
  clusterID = "id",
  pars    = c("price", "feat", "brand"),
  randPars = c(price="ln",feat="n",brand="n"),
  modelSpace     = "pref",
  numMultiStarts = 100
)
summary(mxl_pref_yog_ln)

Model Coefficients: 
                      Estimate  Std. Error z-value  Pr(>|z|)    
price_mu             -0.577539 1288.475860 -0.0004 0.9996424    
feat_mu               1.398752    0.314016  4.4544 8.413e-06 ***
brandhiland_mu       -4.417341    0.699108 -6.3185 2.640e-10 ***
brandweight_mu       -3.426075    0.595650 -5.7518 8.828e-09 ***
brandyoplait_mu       1.375238    0.401803  3.4227 0.0006201 ***
price_sigma          -0.011086 1777.104471  0.0000 0.9999950    
feat_sigma            0.213427    0.201465  1.0594 0.2894295    
brandhiland_sigma    -2.667753    0.462004 -5.7743 7.727e-09 ***
brandweight_sigma     4.190470    0.740094  5.6621 1.496e-08 ***
brandyoplait_sigma    4.153300    1.058661  3.9232 8.739e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood:         -1336.2405915
Null Log-Likelihood:    -3301.3547078
AIC:                     2692.4811829
BIC:                     2750.3633000
McFadden R2:                0.5952448
Adj McFadden R2:            0.5922157
Number of Observations:  2412.0000000
Number of Clusters        100.0000000

I have also estimated a model using logitr multiplying the price coefficient by -1.

yogurt$mprice <- -1*yogurt$price
mxl_pref_yog_ln <- logitr(
  data    = yogurt,
  outcome = "choice",
  obsID   = "obsID",
  panelID = "id",
  clusterID = "id",
  pars    = c("mprice", "feat", "brand"),
  randPars = c(mprice="ln",feat="n",brand="n"),
  modelSpace     = "pref",
  numMultiStarts = 100
)
summary(mxl_pref_yog_ln)
Model Coefficients: 
                    Estimate Std. Error z-value  Pr(>|z|)    
mprice_mu           0.111951   0.011265  9.9378 < 2.2e-16 ***
feat_mu             0.852527   0.297664  2.8641  0.004183 ** 
brandhiland_mu     -5.306878   0.698951 -7.5926 3.131e-14 ***
brandweight_mu     -4.016736   0.630501 -6.3707 1.882e-10 ***
brandyoplait_mu     2.482373   4.589153  0.5409  0.588561    
mprice_sigma       -0.037894   0.022926 -1.6529  0.098354 .  
feat_sigma          0.120210   0.531940  0.2260  0.821213    
brandhiland_sigma  -1.948122   0.443001 -4.3976 1.095e-05 ***
brandweight_sigma   4.855393   0.671178  7.2341 4.685e-13 ***
brandyoplait_sigma  4.571326  10.144816  0.4506  0.652273    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood:         -1217.7184021
Null Log-Likelihood:    -3416.9436545
AIC:                     2455.4368041
BIC:                     2513.3189000
McFadden R2:                0.6436235
Adj McFadden R2:            0.6406969
Number of Observations:  2412.0000000
Number of Clusters        100.0000000

Now standard errors appear to be more reasonable, but point estimates are still different. Not sure what is going on.

I hope this time I estimated the two models correctly!

jhelvy commented 2 years ago

Okay this is all looking a lot more reasonable. I completely forgot about the -1*price thing. This totally makes sense as to why you were getting rather nonsensical results when modeling price as log-normal. The price parameter should be negative, so if you're forcing positivity with a log-normal distribution you should convert price to the negative of price. That's why the first run you did where price was log-normal was having trouble converging - price should have been negative.

The last logitr model you ran looks correct to me, and I believe that is a correct comparison to Stata. I'm a bit confused why the price parameter in Stata is a negative number though. If you've set price to the negative of price, then the mean parameter should be positive (as it is in the logitr solution). This makes me wonder if Stata found a local minimum.

Otherwise, the results are much more similar. The brand coefficients are close, and the order of magnitude for the other parameters are about the same. It's still not a perfect match though as the algorithms reached different solutions: logitr LL is -1217, Stata LL is -1251. So logitr found a "better" solution in terms of fit.

You could also just try to compare the results with only one run in logitr (no multistart). The first run starts from parameters set to 0, which I believe is what Stata does, so that might get you closer to a similar solution.

At this point, I'm pretty sure the differences are just due to reaching different local minima.

Ales-G commented 2 years ago

Dear Helvy, thanks a lot for all of your support in this issue and thanks again for working so much on such a splendid package. I think I know why the price parameter is negative here. According to Hole 2007 "The estimated price parameters in the above model are the mean (bp) and standard deviation (sp) of the natural logarithm of the price coefficient. The median, mean, and standard deviation of the coefficient itself are given by exp(bp), exp(bp + s2 p/2), and exp(bp + s2 p/2) × ? exp(s2 p) − 1, respectively (Train 2009)". Hence the price coefficient is


------------------------------------------------------------------------------
      choice | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
  mean_price |   -.440363   .0774683    -5.68   0.000    -.5921981   -.2885279
   med_price |  -.3972111   .0931187    -4.27   0.000    -.5797204   -.2147018
    sd_price |   .2107663   .0678802     3.10   0.002     .0777235    .3438092
------------------------------------------------------------------------------

hence the coefficient on price is positive. I guess logitr already applies the transformation! The thing that remains unclear to me is why results remain so different between the two softwares. The issue is that regardless of the number of multistarts I set up, results still appear to be significantly different especially on the price variable. as you can imagine this is essential for WTP estimation. For comparison, I am posting the results of the logitr estimates and the Stata estimates both calculated with 1000 multistarts. I do not know, but it could be a Stata issue...

# R

Model Coefficients: 
                    Estimate Std. Error z-value  Pr(>|z|)    
mprice_mu           0.111954   0.011272  9.9324 < 2.2e-16 ***
feat_mu             0.852398   0.297662  2.8636  0.004188 ** 
brandhiland_mu     -5.307112   0.699001 -7.5924 3.131e-14 ***
brandweight_mu     -4.016775   0.630861 -6.3671 1.926e-10 ***
brandyoplait_mu     2.482547   4.588990  0.5410  0.588522    
mprice_sigma       -0.037894   0.022916 -1.6536  0.098205 .  
feat_sigma          0.120193   0.531863  0.2260  0.821213    
brandhiland_sigma  -1.948202   0.442903 -4.3987 1.089e-05 ***
brandweight_sigma   4.855480   0.671721  7.2284 4.887e-13 ***
brandyoplait_sigma  4.571520  10.144256  0.4507  0.652241    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood:         -1217.7184020
Null Log-Likelihood:    -3416.9436545
AIC:                     2455.4368040
BIC:                     2513.3189000
McFadden R2:                0.6436235
Adj McFadden R2:            0.6406969
Number of Observations:  2412.0000000
Number of Clusters        100.0000000

* STATA

Mixed logit model                                       Number of obs =  9,648
                                                        Wald chi2(5)  = 371.13
Log likelihood = -1252.5251                             Prob > chi2   = 0.0000

                                   (Std. err. adjusted for 100 clusters in id)
------------------------------------------------------------------------------
             |               Robust
      choice | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Mean         |
        feat |   1.060867   .2265291     4.68   0.000     .6168781    1.504856
    brandum2 |  -4.997659   .5039539    -9.92   0.000    -5.985391   -4.009928
    brandum3 |  -1.169501    .233826    -5.00   0.000    -1.627791   -.7112102
    brandum4 |  -.0163061   .4169114    -0.04   0.969    -.8334375    .8008253
      mprice |  -.8676371   .1859296    -4.67   0.000    -1.232053   -.5032217
-------------+----------------------------------------------------------------
SD           |
        feat |   1.149806   .2542867     4.52   0.000     .6514134    1.648199
    brandum2 |   -2.65721   .5290221    -5.02   0.000    -3.694075   -1.620346
    brandum3 |   8.547343   1.065777     8.02   0.000     6.458458    10.63623
    brandum4 |    3.25122   .5163507     6.30   0.000     2.239191    4.263248
      mprice |   .5854347   .1059269     5.53   0.000     .3778218    .7930476
------------------------------------------------------------------------------

jhelvy commented 2 years ago

I think the differences we're seeing here are due to reaching different solutions, which can be the result of a few different factors, the starting points being one. Another is the number of draws used in the simulation. By default logitr uses just 50 halton draws (partly for speed). But even if I increase the draws to 1000, I still get a similar solution:

Model Coefficients: 
                     Estimate Std. Error  z-value  Pr(>|z|)    
mprice_mu           0.1073197  0.0044918  23.8926 < 2.2e-16 ***
feat_mu             1.0189873  0.2656793   3.8354 0.0001254 ***
brandhiland_mu     -5.9172651  0.6970628  -8.4889 < 2.2e-16 ***
brandweight_mu     -2.5357505  0.1733728 -14.6260 < 2.2e-16 ***
brandyoplait_mu     0.5006838  0.1952373   2.5645 0.0103328 *  
mprice_sigma       -0.0445569  0.0038197 -11.6650 < 2.2e-16 ***
feat_sigma         -1.1259761  0.3518291  -3.2003 0.0013726 ** 
brandhiland_sigma  -3.1527737  0.4953801  -6.3644 1.961e-10 ***
brandweight_sigma   6.0014793  0.4825431  12.4372 < 2.2e-16 ***
brandyoplait_sigma -4.2985297  0.9001629  -4.7753 1.795e-06 ***

Log-Likelihood:         -1225.2775566

At this point, I can't explain the differences, but it doesn't look like the result of an error somewhere (e.g., unreasonably huge standard errors). I think it's just the algorithms reaching different local minima.

Perhaps it's worth trying to estimate the same models using the mlogit package and / or the gmnl package for comparison? There's also apollo. I primarily wrote logitr to be fast and to support WTP space models, but there are certainly others that can be used for validation.

Ales-G commented 2 years ago

Dear Helvy, thanks a lot for all your clarification and helpfulness throughout the process!

You command is great, fast and accessible! I hope it will soon allow for correlation across parameters in the mixed logit!

jhelvy commented 2 years ago

Correlation is in the works! I appreciate the feedback. Hearing from users is the best way to improve / fix issues with the package.

jhelvy commented 2 years ago

@Ales-G just wanted to let you know that I indeed did have a bug in the previous examples here where the negative of price follows a log-normal distribution. It once again had to do with re-scaling the parameters post-estimation (the data are internally scaled prior to estimation to make it easier to converge). I'm still working to finish up final edits to a new branch, which will eventually become v0.6.0, but you can install it here:

devtools::install_github("jhelvy/logitr@correlation")

If you don't mind, it'd be helpful to see if you're getting similar results with Stata now using log-normal parameters. Note as I mentioned before that there is a good chance that logitr may reach a "better" solution in terms of the log-likelihood compared to Stata as it depends on the starting parameters. Using a multistart with logitr will usually find a better log-likelihood. So checking the log-likelihood value at convergence is helpful for making sure they indeed reached the same solutions. But even with slightly different solutions, the parameters themselves should be relatively similar in terms of the sign and magnitude.

jhelvy commented 2 years ago

This should now be resolved as of v0.7.0.