Error in X_chosen[data$obsID, ] : subscript out of bounds

jhelvy / logitr

Fast estimation of multinomial (MNL) and mixed logit (MXL) models in R with "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations in R

https://jhelvy.github.io/logitr/

Other

42 stars 15 forks source link

Error in X_chosen[data$obsID, ] : subscript out of bounds #30

Closed Ales-G closed 2 years ago

Ales-G commented 2 years ago

Hello, it is me again! apologies for all these comments but I really like your program and I am using it on a number of datasets, which makes me encounter a number of errors.

I am estimating a basic model, without clustered standard errors, I have a dataset of >3500 observations.

mnl_pref <- logitr(
+   data           = dat_transformed,
+   outcome        = "choice",
+   obsID          = "tskID",
+   pars           = c("wg",myvars),
+   modelSpace     = "pref",
+   panelID        = "wrkid",
+   numMultiStarts = 100)
# and I get the following error
Error in X_chosen[data$obsID, ] : subscript out of bounds

Again I have looked at my tskID variable, and it looks correct.

Any idea of what may be causing this error? thanks a lot for all of your help and support! you are really making a great contribution to me and to the community in general!

jhelvy commented 2 years ago

Yeah this too looks similar to the other error. Anytime you see an error with obsID involved, it usually has to do with how the IDs are set up. It's on my list to have a data check function be called right before estimating the model to make sure all the inputs are correct. It's hard to know what's causing this without seeing the data.

jhelvy commented 2 years ago

Btw, you may also want to try using xlogit. It's a python package with a very similar UI to logitr (and it's actually even faster!). It doesn't have WTP space models yet, though that's in the works. It requires the same data structure, so if you use it and don't get any errors, that means there's a bug in logitr. If you get errors, then it's probably an error in the data somewhere.

Ales-G commented 2 years ago

I think I found out what was wrong. It was my mistake.

I had a few data entry issues. In particular, I had some task where both profiles had choice==0. It is something silly. Maybe it would be worth putting a preliminary error in the function that helps identify similar problems.

But thanks for your helpfulness

jhelvy commented 2 years ago

Ah okay well glad we figured it out. Yes I have it on my todo list to add more checks to validate the data so that these sorts of errors can be caught more easily. The current error messages you end up getting due to a data error are not helpful for debugging.

HeniCha commented 2 years ago

Dear Professor Helveston, I am also a big fan of your great logitr code - especially because of the WTP space estimation! Thanks a lot for your work!

I've been working with the code back in the beginning of May and I successfully ran it in preference and WTP space with and without clustered standard errors. Today, I came back working on my analyses and wanted to re-run the exact same code with the exact same dataset as I did in the beginning of May. I just updated the package to your new logitr 0.7.0 version. However, now I receive the same error Ales-G mentioned in this comment: Error in X_chosen[data$obsID, ] : subscript out of bounds

As I said, I haven't changed the dataset or the code since the beginning of May (except for changing the argument name price to scalePar and taking the modelSpace argument out). Even with the simplest model in preference space without clusters, I receive the error. I also checked my dataset again, but I always have four observations per obsID like it should be.

Could it be that anything changed in the code that causes the error since the 0.7.0 update?

I would very much appreciate your help and any ideas! Thanks a lot in advance!

jhelvy commented 2 years ago

Hi @HeniCha , thanks for your message. Yes, there is a chance I introduced a new bug here with some of the changes I made in the latest version since May. I attempted to make the package more robust by running some checks on the obsID variable prior to estimating the model to avoid issues like this, but it seems it is still persisting. It is difficult to identify the source of the issue without the data to test against. I have not been able to replicate this issue using the data that comes with the package.

Could you post a portion of the data somewhere and some code here so we can have a reproducible example of the issue? You can keep it as simple as possible, just one or two attributes and only a sample of the data. Just enough so that you can get the error when you run it.

HeniCha commented 2 years ago

Thanks so much for your quick reply! Please find attached the code:

sample <- read.csv("Merged_long_adj.csv")

attributes <- c("Dlabel2", "Dorigin", "asc")

set.seed(111)
pref_base <- logitr(
  data      = sample,
  outcome   = "response",
  obsID     = "gid",
  panelID   = "respid",
  clusterID = "respid", 
  numDraws  = 1000,
  pars      = c("price", attributes),
  randPars  = c(Dlabel2="n", Dorigin="n", asc="n")
)

The example is with clustered se, but even if I take panelID and clusterID out, I receive the error.

Thanks so much!

jhelvy commented 2 years ago

Okay, I think I found the error in this. I had previously over-written whatever was provided in the data for the obsID variable as a sequentially increasing series of numbers. So the user could provide really any identifier for the obsID (even characters) and it would get overwritten. Somehow in adding a few tests for the obsID variable I lost this line of code. I just added it back in this commit, which fixes this issue.

However, in debugging this I realized that I have yet another bug when computing the standard errors with clustering. This one is a super small bug that I just fixed with this commit.

If you install from github using remotes::install_github("jhelvy/logitr"), everything should hopefully run smoothly.

HeniCha commented 2 years ago

Thanks a lot! It now works perfectly and I can successfully run the code again.

Thanks so much for quickly fixing the errors and for your prompt reply!

jhelvy commented 2 years ago

That's great! @Ales-G any chance this also fixes the issue you were having?

jhelvy commented 2 years ago

I think this issue is now addressed with recent bug fixes in v0.7.2.