melff / mclogit

mclogit: Multinomial Logit Models, with or without Random Effects or Overdispersion
http://melff.github.io/mclogit/
22 stars 4 forks source link

Incorrect predictions from mblogit when predictors are scaled #16

Closed rvlenth closed 3 years ago

rvlenth commented 3 years ago

I have found that when an mblogit model contains scaled predictor(s), then subsequent predictions are incorrect. Here is an example comparing predictions from equivalent models using nnet::multinom and mclogit::mblogit:

library(MASS)
library(nnet)
library(mclogit)
#> Loading required package: Matrix

# silly modification to housing data
housing = transform(MASS::housing, x = 13 + as.numeric(Infl))

# x values for testing with prediction
newx = data.frame(x = 10:12)

# Model with multinom:
h.mult = multinom(Sat ~ scale(x), weights = Freq, data = housing)
#> # weights:  9 (4 variable)
#> initial  value 1846.767257 
#> final  value 1772.112751 
#> converged
predict(h.mult, newdata = newx, type = "probs")
#>         Low    Medium       High
#> 1 0.8326790 0.1448212 0.02249979
#> 2 0.7703353 0.1843759 0.04528885
#> 3 0.6862040 0.2260201 0.08777590

# Model with mblogit
h.mbl = mblogit(Sat ~ scale(x), weights = Freq, data = housing)
#> 
#> Iteration 1 - Deviance = 3557.833
#> Iteration 2 - Deviance = 3544.227
#> Iteration 3 - Deviance = 3544.226
#> Iteration 4 - Deviance = 3544.226
#> converged
predict(h.mbl, newdata = newx, type = "response")
#>            Low    Medium      High
#> [1,] 0.4245383 0.2802862 0.2951755
#> [2,] 0.3148686 0.2702901 0.4148413
#> [3,] 0.2167930 0.2419704 0.5412366

Note that the predictions differ substantially. This is in part because the scaling information is not available. Normally, this information is in the predvars attribute of the terms component:

attr(h.mult$terms, "predvars")
#> list(Sat, scale(x, center = 15, scale = 0.822226451793038))
attr(h.mbl$terms, "predvars")
#> NULL

I tried tricking it into doing the right thing by copying this attribute from the other model. However, this does not change the predictions:

attr(h.mbl$terms, "predvars") = attr(h.mult$terms, "predvars")
predict(h.mbl, newdata = newx, type = "response")
#>            Low    Medium      High
#> [1,] 0.4245383 0.2802862 0.2951755
#> [2,] 0.3148686 0.2702901 0.4148413
#> [3,] 0.2167930 0.2419704 0.5412366

Created on 2021-02-16 by the reprex package (v0.3.0)

A few notes:

  1. Presumably, similar errors will occur with other predictor functions, e.g., poly(), where we need the transformation parameters based on the original data.
  2. Two changes are needed to set this right: a) Make sure the $terms component of the returned model is complete. Normally, it can be obtained as an attribute of the model matrix b) The predict.mblogit method seems to build its model matrix X from the model formula (In see rhs <- object$formula[-2] in the code), rather than the $terms component of the object.
melff commented 3 years ago

This seems to be fixed in commit 46dc21c.

rvlenth commented 3 years ago

Thanks -- looks promising. I'll check it out soon.

rvlenth commented 3 years ago

Thanks. I have done some testing, in particular that it works correctly with emmeans; and all seems correct. Thanks again!