The order of priors in cbc_design and its connection to cbc_profiles

mmardehali commented 12 months ago

Background: I have conducted a pilot study, designed using cbcTools with zero priors, and the results of the experiment were analyzed using logitr. When estimating the model in logitr, for one of my categorical variables I wanted to use a different category as reference. logitr (I'm assuming) was picking the first category alphabetically, and using that as reference. I used the factor() function, as specified in the paper on logitr (under section 5.4. Continuous and discrete variable coding), and identified another category as reference. Let's call the attribute Information, and the levels are c("High", "Medium", "Low", "Unavailable"). In logitr, if I didn't use the following code, "High" would be used as the reference category, but I wanted "Unavailable" to be the reference:

ChoiceData$Information <- factor(
  x = ChoiceData$Information,
  levels = c("Unavailable", "Low", "Medium", "High")
)

Now, the resulting MNL model coefficient estimates are:

Model Coefficients: 
                     Estimate Std. Error z-value  Pr(>|z|)    
Cost              -0.130720   0.048061 -2.7199  0.006530 ** 
UserRating         3.241950   0.357835  9.0599 < 2.2e-16 ***
InformationLow    -1.794934   0.629253 -2.8525  0.004338 ** 
InformationMedium  1.893437   0.403223  4.6958 2.656e-06 ***
InformationHigh    5.237012   0.574275  9.1194 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

My understanding based on the examples provided for cbc_design, is that when creating profiles using cbc_profiles, the order of levels when identifying the categorical attributes determines the order of priors in cbc_design. The first level will be ignored in the priors vector (or considered as reference), and priors for the rest of the levels will be introduced. So now that I will be including my priors, and because I used "Unavailable" as reference category in obtaining those priors, I think I need to make sure the order of levels for Information are consistent:

Profiles <- cbc_profiles(
  Cost = seq(20, 30, 5),
  UserRating = seq(3.2, 4.8, 0.8),
  Information = c("Unavailable", "Low", "Medium", "High") #<=== The order
)

design_dbeff <- cbc_design(
  profiles = Profiles,
  n_resp = Nresp,
  n_alts = Nalt,
  n_q = Nchoice,
  n_start = 10,
  priors = list(
    Cost = -0.13,
    UserRating = 3.2,
    Information = c(-1.8, 1.9, 5.2) #<=== The same order for "Low", "Medium", "High"  but excluding the prior for "Unavailable"
  ),
  method = "Modfed",
  keep_db_error = TRUE,
  parallel = TRUE
)

My Questions:

Are my assumptions correct that 1) logitr alphabetically selects the reference category, 2) the order of levels for a categorical attribute determines the order of the priors, and 3) the prior for the first level needs to be ignored in the vector of priors, if that level was used as reference?
Considering that using priors is the main point of DB-Efficient designs, is there a way to increase the perceived reliability of this process when identifying priors for categorical attributes, maybe by using a named list (aka dictionary) to identify the priors? I think improving this aspect will help users make sure they are identifying the correct priors for each level, without having to rely on identifying the values in a specific order that may be different from their original design. For example:
```
.
.
.
priors = list(
Cost = -0.13,
UserRatings = 3.2,
Information = list(Unavailable = "Reference", Low = -1.8, Medium = 1.9, High = 5.2)
),
.
.
.
```
If creating a nested list like above wreaks havoc on the internal operations, maybe the named list can be assigned to a variable before using cbc_design, and then that variable can be identified as the prior for a categorical attribute? For example:
```
InformationPriors <- list(Unavailable = "Reference", Low = -1.8, Medium = 1.9, High = 5.2)
.
.
.
priors = list(
Cost = -0.13,
UserRatings = 3.2,
Information = InformationPriors
),
.
.
.
```

jhelvy commented 12 months ago

Yes, you are understanding how the reference cases are set in both {logitr} and {cbcTools}:

{logitr} uses the factor order, which by default is alphabetical, and you can always manually change the factor ordering as you show in your example above. The first level will be set as the reference level.
{cbcTools} sets the order based on how you define attributes in cbc_profiles(). Again, the first level is set as the reference level.

I like the suggestion of providing explicit names for the priors for clarity. That should be relatively easy to implement, though I would probably use a vector instead of a list since the priors are already defined as a vector, e.g.:

priors <- c(Low = -1.8, Medium = 1.9, High = 5.2)

Note that I left out the reference level in the priors, which I think should be intuitive enough. In fact, I could modify this such that if you're using a named vector whatever level you leave out will be used as the reference level, e.g.:

priors <- c(Unavailable = -1.8, Low = 1.9, Medium = 5.2)

In the above case the "High" level is left out and would then be modeled as the reference level.

This shouldn't be too difficult to implement.

mmardehali commented 12 months ago

I think that would be a great implementation, and it's reasonable to assume that the left-out category is the reference. Especially considering the fact that the output of logitr automatically doesn't include the reference categories, I think it would be easy to connect those dots. Thank you!

jhelvy / cbcTools

The order of priors in cbc_design and its connection to cbc_profiles #24