jhelvy / logitr

Fast estimation of multinomial (MNL) and mixed logit (MXL) models in R with "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations in R
https://jhelvy.github.io/logitr/
Other
42 stars 15 forks source link

Improve obsID error messaging #50

Closed jhelvy closed 1 year ago

jhelvy commented 1 year ago

The obsID variable will cause an error if it's not a perfectly sequentially increasing numeric vector. This seems overly restrictive as it just needs to identify unique observations. For example, this works:

library(logitr)

head(yogurt)

model <- logitr(
    data    = yogurt,
    outcome = "choice",
    obsID   = "obsID",
    pars    = c("price", "feat", "brand")
)

But now if I modify a single observation ID to a totally different number (that is not in conflict with others) it errors:

yogurt[which(yogurt$obsID == 2000),]$obsID <- 5000

model <- logitr(
    data    = yogurt,
    outcome = "choice",
    obsID   = "obsID",
    pars    = c("price", "feat", "brand")
)

Error in checkRepeatedIDs("obsID", obsID, reps) : 
  The 'obsID' variable provided has repeated ID values.

This is a pretty misleading error because there actually aren't repeated ID values, and it's also not clear what "repeated" means (it's in a long form data structure, so there are repeated ID numbers across rows in the same observation...but that's what is expected).

This fixes the problem:

yogurt$obsID <- rep(seq(length(unique(yogurt$obsID))), each = max(yogurt$alt))

But automating that kind of over-writing is not so trivial because some data sets may not have symmetry in the number of alternatives per choice observation. It would be better to use the reps vector to create new observation IDs internally and then replace them post-estimation with the original ones. If that is done, then this problem will never occur.

But the error message should still be updated nonetheless to clarify what is meant by "repeated".