Open jonah-allen opened 2 weeks ago
Could you send a small portion of the data so I can replicate this error? Would only need the relevant variables used in the example: "choice", "obsID", "loss_value", "risk", "clusterID"
Just realizing that only has cluster groups 1 and 2 included -- there are 25 cluster groups in my data....I think you can manually change those for replication but let me know if you need a different sample!
Okay I just ran this and I can replicate the error. It is perhaps a bug in the code, but I'm not sure if it should occur because I'm questioning the use of clusters here. Usually clustering is suggested when you have panel data. In your case, you have different versions. Is that just different versions of a choice experiment? If so then I'm not sure why you would want to cluster your errors around the version. Basically, I don't think clustering is needed.
If you do want to use clusters, then as a work around you can also set panelID = "clusterID"
and it will work. With a MNL model there is no difference in the calculation of the log-likelihood with or without a panelID specified, so this will get you what you want without error. You just need to specify both clusterID and panelID like this:
m2 <- logitr(
data = wtp_risk,
outcome = "choice",
obsID = "obsID",
pars = c("loss_value", "risk"),
clusterID = "clusterID",
panelID = "clusterID"
)
Thanks very much, that fixed the issue!
Interesting -- my understanding was that clustering by the survey version is best practice because I have significant variation across survey versions; parameters vary (percent risk & percent profit loss) across three options, and the "source of risk" varies across half the surveys (half are viewed as "climate" and the other is "policy", without going into too much detail). I know that description is very general...but any resources you might be able to share on clustering in this case would be very appreciated!
I suppose that's a reasonable assumption. This is in general though an issue that I'll have to deal with because you should be able to use clusters without defining a panelID. This is a workaround for now, but I'll patch this.
First of all, thank you for developing a fantastic package! I am having and issue implementing clusters. Could you please provide insight into why this error occurs or suggest any adjustments to handle clustering properly? Thank you for your support!
Description
I encountered an issue when running the
logitr()
function with theclusterID
parameter specified. While the model runs successfully withoutclusterID
, adding it results in an error. The error message is:I created a
clusterID
column specifically to ensure the data type in the version column was not the issue. (The scenario columns are not currently in use but may be in the future).Reproducible Example
Here is a sample structure of my dataset (
wtp_risk
):Code to Reproduce
Observations
wtp_risk
data frame does not containNA
values inobsID
,clusterID
, orversion
.clusterID
was created to ensure correct data type handling.panelID
is internally processed in thelogitr
function, even whenpanelID
is not specified. (This is not panel data).Environment