mattwigway / DiscreteChoiceModels.jl

Discrete choice/random utility models in Julia
MIT License
13 stars 3 forks source link

Possible race condition? #22

Closed mattwigway closed 2 years ago

mattwigway commented 2 years ago

I once got this results from the Apollo-derived mixed logit example:

[ Info: Optimization converged successfully after 33 iterations
┌ Info: Using method BFGS,
└ 125 function evaluations, 125 gradient evaluations
[ Info: Calculating and inverting Hessian
Mixed logit model
Initial simulated log-likelhood (at starting values): -2253.9092260819143
Final simulated log-likelihood: -1463.6263432910246
500 DiscreteChoiceModels.HaltonDraws draws
McFadden's pseudo-R2 (relative to starting values): 0.3506276444702606
┌─────────────┬──────────┬───────────┬───────────┐
│             │     Coef │ Std. Err. │    Z-stat │
├─────────────┼──────────┼───────────┼───────────┤
│       βtt_μ │ -2.09179 │   0.08640 │ -24.20929 │
│ βtt_sqrt(σ) │ -0.70679 │   0.05111 │ -13.82776 │
│       βtc_μ │ -1.12657 │   0.13656 │  -8.24944 │
│ βtc_sqrt(σ) │  0.98525 │   0.04521 │  21.79170 │
│       βhw_μ │ -2.95187 │   0.05533 │ -53.35381 │
│ βhw_sqrt(σ) │  0.00000 │   0.85810 │   0.00000 │
│       βch_μ │  0.52521 │   0.07447 │   7.05216 │
│ βch_sqrt(σ) │  0.95250 │   0.06135 │  15.52537 │
└─────────────┴──────────┴───────────┴───────────┘

Usually I get this result:

[ Info: Optimization converged successfully after 40 iterations
┌ Info: Using method BFGS,
└ 143 function evaluations, 143 gradient evaluations
[ Info: Calculating and inverting Hessian
Mixed logit model
Initial simulated log-likelhood (at starting values): -2253.9092260819143
Final simulated log-likelihood: -1442.5013509001801
500 DiscreteChoiceModels.HaltonDraws draws
McFadden's pseudo-R2 (relative to starting values): 0.36000024570299405
┌─────────────┬──────────┬───────────┬───────────┐
│             │     Coef │ Std. Err. │    Z-stat │
├─────────────┼──────────┼───────────┼───────────┤
│       βtt_μ │ -1.98503 │   0.08774 │ -22.62411 │
│ βtt_sqrt(σ) │  0.68849 │   0.04613 │  14.92608 │
│       βtc_μ │ -1.02282 │   0.14106 │  -7.25100 │
│ βtc_sqrt(σ) │  1.00160 │   0.05149 │  19.45410 │
│       βhw_μ │ -2.93255 │   0.08433 │ -34.77303 │
│ βhw_sqrt(σ) │ -0.89699 │   0.04834 │ -18.55654 │
│       βch_μ │  0.62777 │   0.07328 │   8.56647 │
│ βch_sqrt(σ) │  0.91959 │   0.05125 │  17.94467 │
└─────────────┴──────────┴───────────┴───────────┘

This should be deterministic. Possibly a race condition somewhere.

mattwigway commented 2 years ago

Looking at this further, I get slightly different results each time I run.

mattwigway commented 2 years ago

If I run with a single thread, I consistently get something close to the top result, so this does appear to be a threading issue.

mattwigway commented 2 years ago

Not an issue in HaltonSequences. If I prematerialize the sequences I get the same problem.

Could be a numerical issue due to order of adding things.

mattwigway commented 2 years ago

It does appear to be due to order of addition. If I modify code to make addition order deterministic (by accumulating each group likelihood in an array then summing), I always get a consistent result (close to the top one).

I think the squaring of the coefficients may be causing this by creating a saddle point that the optimization may or may not get stuck in depending on tiny differences due to floating point errors. Note that in the less-desirable solution, Bhw_sqrt(sigma) is always exactly 0. Squaring is to keep them positive, gonna try exponentiating instead - since that's a monotonic transformation.

mattwigway commented 2 years ago

Confirmed that exponentiating rather than squaring solves the problem, with only very slight variations now due to numerics.

mattwigway commented 2 years ago

Fixed in 879b0df