Closed Ales-G closed 2 years ago
This type of message usually means there is something wrong with the obsID
variable. That variable must be a continuously increasing numeric variable identifying the rows of each choice observation, e.g., 1,1,1,2,2,2,3,3,3, etc. Anything else will cause errors, e.g., 1,1,1,2,2,2,1,1,1,3,3,3 will cause an error because the 1 is repeated later. Can you double check your tskID
variable?
So this is what the data looks like.
I checked whether there is some error (more or less than two observations per choice task), but it appears to be correct.
> dat_transformed %>%
+ group_by(tskID) %>%
+ filter(n()>2) %>%summarise(n=n())
# A tibble: 0 × 2
# … with 2 variables: tskID <dbl>, n <int>
Another hint that may help understand if I am making an error, or if there is a problem with the command:
if I estimate the model with vcov=FALSE
the model is correctly computed (though not displayed)
mnl <- logitr(
+ data = dat_transformed,
+ outcome = "choice",
+ obsID = "tskID",
+ pars = c("wg",myvars),
+ modelSpace = "pref",
+ clusterID = "wrkid",
+ panelID = "wrkid",
+ numMultiStarts = 100)
Setting robust to TRUE since clusters are being used
Running multistart...
Iterations: 100
Cores: 7
Done!
> summary(mnl)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) :
incorrect length for 'group'
However, if I try to include the vcov matrix in the model it gives an error while calculating the model.
> mnl_vcov <- logitr(
+ data = dat_transformed,
+ outcome = "choice",
+ obsID = "tskID",
+ pars = c("wg",myvars),
+ modelSpace = "pref",
+ clusterID = "wrkid",
+ panelID = "wrkid",
+ vcov = T,
+ numMultiStarts = 100)
Setting robust to TRUE since clusters are being used
Running multistart...
Iterations: 100
Cores: 7
Error in rowsum.default(expV, group = obsID, reorder = FALSE) :
incorrect length for 'group'
Do you still think is something that is related to the variable identifying the rows of each choice observation?
Okay so the vcov
part makes sense in terms of identifying the source of the error. It must be in computing the variance-covariance matrix, which also makes sense that the error happens when you call summary()
because that calls vcov()
internally to compute the standard errors.
Can you do one more check - comment out the clusterID
part and see if it still errors. This will help me understand if the error is coming from the computation of clustered standard errors or if it's something else.
Indeed you are correct. If I remove the clustered standard errors the command works fine! It is clearly there the issue!
Okay so that means this has to do with how the clustered errors are being computed. You should be able to estimate the model (with clustered errors) without any errors if you have vcov = FALSE
(the default). Then if you call vcov(model)
you'll get the error. Can you do that and post the message?
Apologies for the delaied reply. Here is the results and the error
> mnl_vcov <- logitr(
+ data = dat_transformed,
+ outcome = "choice",
+ obsID = "tskID",
+ pars = c("wg",myvars),
+ modelSpace = "pref",
+ clusterID = "wrkid",
+ panelID = "wrkid",
+ numMultiStarts = 100)
Setting robust to TRUE since clusters are being used
Running multistart...
Iterations: 100
Cores: 7
Done!
> vcov(mnl_vcov)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) :
incorrect length for 'group'
To not however that I have run the model using the robust standard errors and I get the same error.
Is this of any help?
Any chance this is also related to the data errors in #30? It looks like it might be based on the errors you're getting.
Unfortunately, I don't think so, I believe it has something to do with the SEs, because on this data the model works fine until I do not include robust or clustered standard errors. So I am not sure what to make of it. Any suggestion of things I could try out?
The issue is from computing robust standard errors. When doing this, the gradient is computed multiple times according to each cluster defined by the clusterID
variable (in your example, wrkid
). Perhaps there is something strange about that variable?
I believe this issue may also be related to the same bug in #30. This (I think) is resolved with the latest fixes in v0.7.1.
Dear jhelvy,
thank you very much for all your work and for the numerous improvements to the package you made.
I just wanted to let you know that even with the v0.7.1 I have this problem.
If I estimate a model thus and I call for the summary I get the following error: Error in rowsum.default(expV, group = obsID, reorder = FALSE) : incorrect length for 'group'
mnl_pref <- logitr::logitr(
data = df,
outcome = "choice",
obsID = "tskID",
pars = c("cost",myvars),
panelID = "wrkid",
robust = T,
numMultiStarts = 100)
Setting clusterID to 'wrkid' since robust == TRUE and a panelID is provided
Running multistart...
Iterations: 100
Cores: 7
Done!
summary(mnl_pref)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) : incorrect length for 'group'
Conversely, if I comment out the robust option errors the summary function works well and I obtain the results. I am not sure what is happening but I still believe there is an issue with the clustered standard errors.
In any case thanks a lot for the great progress.
Thanks for noting this - it's probably still a small bug somewhere in how I wrote the summary function when using clustered errors.
Could you try to make a similar example that creates this error using one of the package data sets, like the yogurt
data? That would help a lot in identifying the source of the issue.
I tried playing around with it but I was not able to produce the error I have with the real data Compared to yogurt, my data is different because the obsID has gaps, and because not all of the ids have more than a task assigned.
data("yogurt")
random_binary <- function(n, p){
# p is the proportion of 1s
x <- c(rep(1, times=n * p), rep(0, times=n * (1 - p)))
x[sample(length(x))] # or sample(x)
}
yogurt<- yogurt %>% mutate(ran1=random_binary(nrow(yogurt),0.5),
ran2=random_binary(nrow(yogurt),0.5),
ran3=random_binary(nrow(yogurt),0.5),
ran4=random_binary(nrow(yogurt),0.5),
ran5=random_binary(nrow(yogurt),0.5))
y1 <- yogurt %>% group_by(id) %>% slice(1:4)
y2 <- yogurt %>% group_by(id) %>% slice(5:8) %>% ungroup() %>% slice(1:200)
yog <- full_join(y1,y2) %>% arrange(obsID)
head(yog,15) # this is what my data looks like
# A tibble: 15 × 12
# Groups: id [2]
id obsID alt choice price feat brand ran1 ran2 ran3 ran4 ran5
1 1 1 1 0 8.1 0 dannon 1 0 1 1 1
2 1 1 2 0 6.10 0 hiland 1 0 0 1 1
3 1 1 3 1 7.90 0 weight 1 1 0 0 0
4 1 1 4 0 10.8 0 yoplait 1 1 0 1 1
5 1 2 1 1 9.80 0 dannon 1 0 0 0 0
6 1 2 2 0 6.40 0 hiland 1 0 1 1 1
7 1 2 3 0 7.50 0 weight 0 1 0 0 1
8 1 2 4 0 10.8 0 yoplait 1 0 1 1 1
9 2 9 1 0 9.80 0 dannon 1 0 0 1 1
10 2 9 2 0 5.00 0 hiland 0 0 1 1 0
11 2 9 3 0 7.90 0 weight 0 1 0 0 1
12 2 9 4 1 10.8 0 yoplait 0 0 1 0 1
13 2 10 1 0 9.80 0 dannon 0 1 1 0 0
14 2 10 2 0 5.00 0 hiland 0 1 0 0 0
15 2 10 3 0 7.90 0 weight 1 0 0 0 1
However estimating a model with robust standard errors on this data does not give errors.
mnl_pref <- logitr::logitr(
data = yog,
outcome = "choice",
obsID = "obsID",
pars = c("price", "feat", "brand","ran1","ran2","ran3","ran4","ran5"),
panelID = "id",
robust = T,
clusterID = "id",
numMultiStarts = 100)
summary(mnl_pref)
Note that I tried to re-create the obsID variable for my real data to make it increase incrementally without gaps and I still experience the same error.
Note also that I obtain the same error in the estimation phase if I include vcov=T
option
So I figured out how to create a small dataset that reproduces the same error
First create the following data
data <- structure(list(id = structure(c(1, 1, 8, 8, 12, 12, 13, 13,
13, 13, 14, 14, 16, 16, 42, 42, 47, 47, 51, 51, 53, 53, 55, 55,
57, 57, 62, 62, 63, 63, 65, 65, 69, 69, 71, 71, 77, 77, 78, 78,
79, 79, 83, 83, 84, 84, 90, 90, 95, 95, 95, 95, 96, 96, 100,
100, 102, 102, 104, 104, 107, 107, 108, 108, 108, 108, 112, 112,
113, 113, 114, 114, 118, 118, 130, 130, 132, 132, 139, 139, 139,
139, 141, 141, 146, 146, 147, 147, 147, 147, 153, 153, 167, 167,
169, 169, 170, 170, 176, 176), format.stata = "%12.0g"), tskID = structure(c(2,
2, 23, 23, 35, 35, 37, 37, 39, 39, 40, 40, 48, 48, 126, 126,
139, 139, 153, 153, 157, 157, 163, 163, 171, 171, 186, 186, 188,
188, 193, 193, 205, 205, 213, 213, 230, 230, 233, 233, 235, 235,
248, 248, 251, 251, 268, 268, 284, 284, 285, 285, 287, 287, 298,
298, 306, 306, 311, 311, 320, 320, 322, 322, 323, 323, 335, 335,
339, 339, 341, 341, 352, 352, 388, 388, 395, 395, 415, 415, 416,
416, 421, 421, 438, 438, 439, 439, 440, 440, 459, 459, 501, 501,
506, 506, 508, 508, 528, 528), format.stata = "%10.0g"), choice = structure(c(1,
0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0,
1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0,
1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1,
1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1,
0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0), format.stata = "%10.0g"),
cost = structure(c(16.3, 15.6, 15.4, 15.1, 16.5, 16.4, 16.7,
15.6, 15.8, 16.5, 15.5, 15.9, 15.7, 16.4, 15.1, 15.9, 16.6,
15, 16.1, 16.3, 15.7, 17.9, 16.2, 15.3, 16.4, 16.5, 17.1,
17.3, 15.2, 16.1, 16.5, 16.9, 15, 16.8, 16.5, 17.4, 15.2,
15, 16.4, 15.4, 16.8, 16.1, 16.1, 15.4, 16.8, 17.8, 16.6,
17.6, 15.6, 15.7, 16.3, 16.3, 16, 17.3, 17.4, 17.9, 15, 17.6,
15.4, 16.2, 15.6, 17.4, 17.8, 15.7, 17.2, 17.1, 15.9, 17.5,
17, 16.4, 15.2, 16.3, 15.3, 15.2, 16.3, 15.6, 15.2, 15.7,
15.5, 15.7, 16.2, 15.6, 17.1, 16.5, 15.1, 16.8, 16.6, 16.3,
15.3, 15.9, 16.1, 16.3, 17.9, 15.9, 15.4, 16.1, 16.4, 15,
17.7, 15.8)),
var1 = structure(c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0,
0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0,
1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1)),
var2 = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0,
1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0,
0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1)),
var3 = structure(c(0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0,
0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0))), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
Second try to estimate the a MNL with robust SEs.
mnl_pref <- logitr::logitr(
data = data,
outcome = "choice",
obsID = "tskID",
pars = c("cost","var1","var2","var3"),
panelID = "id",
robust = T,
numMultiStarts = 100)
Setting clusterID to 'id' since robust == TRUE and a panelID is provided
Running multistart...
Iterations: 100
Cores: 7
Done!
summary(mnl_pref)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) :
incorrect length for 'group'
The command correctly estimates it, but it encounters problems with displaying the summary. Note that if you comment out the robust option the summary function works correctly!
Is there something I can do to overcome the problem?
I am almost certain that the issue is in how the ID variables are set. As it is currently written, the package needs IDs that are in sequentially increasing order, like 1,1,2,2,3,3, etc. The gaps in the IDs are probably the issue. You could test this by taking your demo dataset and over-writing the IDs with a sequentially ID, then see if it works. If that works, then we at least know the source of the problem.
This shouldn't be something that happens though in terms of how I would want the package to work. As long as the IDs are unique, being in a sequential order shouldn't matter, e.g. 1,1,8,8,13,13, should work just the same. So if that is indeed the problem, then I'll try to add a fix to make the package work even if the IDs have gaps.
I tried to follow your suggestion, but unfortunately it still gives the same error
data<- data %>%
dplyr::group_by(id) %>%
dplyr::mutate(group_id = cur_group_id())
mnl_pref <- logitr::logitr(
data = data,
outcome = "choice",
obsID = "tskID",
pars = c("cost","var1","var2","var3"),
panelID = "group_id",
robust = T,
numMultiStarts = 100)
summary(mnl_pref)
Error in rowsum.default(expV, group = obsID, reorder = FALSE) :
incorrect length for 'group'
Is it possible that it also relates to the fact that some individuals perform a different number of tasks?
I believe I have found the source of the issue. I am working to add better checks to all of the variable IDs while also addressing this bug.
Okay, I think this is now corrected. Can you install the latest version (0.7.2) from Github and test it for me?
# install.packages("remotes")
remotes::install_github("jhelvy/logitr")
The issue has persisted this whole time and it simply had to do with the particular case where a cluster had only one observation. In these instances, when computing the clustered standard errors the data had only one row and was no longer being stored as a matrix, so I had to coerce it back to a matrix for the math operations to correctly execute.
Thanks you very much. This is great! It works perfectly now!
Hello, i am running a standard multinomial model of this king.
the model is calculated correctly, but when I ask to summarize results I get the following error function
I am not sure what may be causing it. Any suggestions to work around the issue?