yhoriuchi / projoint

A package for a more general, more straightforward, and more creative conjoint analysis
https://yhoriuchi.github.io/projoint/
Other
10 stars 0 forks source link

The `agree` variable is currently NA for non-repeated tasks. #17

Closed yhoriuchi closed 1 year ago

yhoriuchi commented 1 year ago

@thegaryking and I had a discussion about whether we specify the agree variable (= 1 if a choice in the repeated task is the same as the first task, =0) for non-repeated tasks. Currently, agree is NA for all non-repeated tasks. But this could cause an issue when users want to estimate specific choice-level QOIs, when they do not have many observations, and when they want to estimate the QOIs for some specific subgroups. For example, users may want to estimate the choice-level marginal means of choosing a Korean candidate when the race of two candidates is {Korean, White} among Korean respondents. (This is a real issue I encountered in my working paper.) The number of "relevant" repeated tasks, which include {Korean, White} pairs for Korean respondents, is quite small. So, my suggestion is to "impute" the agree variable not only for the (single) repeated task but also for all the other non-repeated tasks. This "assumption" is based on our empirical finding that whether or not a respondent chooses the same profile or not in the repeated task is independent of information contained in the conjoint table.

thegaryking commented 1 year ago

Maybe it doesn't matter, as long as we carefully document what is coming out of our program. Any user could change the coding as they see fit, but would still benefit by our default choices. Gary

yhoriuchi commented 1 year ago

Ok, @thegaryking. I revised reshape_projoint() function and added another argument named ".fill" given that I use fill() function in the tidyr package.

See the following examples:

df <-  reshape_projoint(
  .dataframe = exampleData1, 
  .idvar = "ResponseId", 
  .outcomes = c(paste0("choice", 1:8), "choice1_repeated_flipped"),
  .outcomes_ids = c("A", "B"),
  .alphabet = "K", 
  .repeated = TRUE,
  .flipped = TRUE, 
  .fill = FALSE)

returns the following data frame (tibble):

# A tibble: 6,400 × 13
   id                 task profile att4        att7        att3        att1        att2    att5  att6  selected selected_repeated agree
   <chr>             <dbl>   <dbl> <fct>       <fct>       <fct>       <fct>       <fct>   <fct> <fct>    <dbl>             <dbl> <dbl>
 1 R_1M3TDihZzq9zDgX     1       1 att4:level1 att7:level1 att3:level3 att1:level3 att2:l… att5… att6…        0                 1     0
 2 R_1M3TDihZzq9zDgX     1       2 att4:level1 att7:level2 att3:level2 att1:level3 att2:l… att5… att6…        1                 0     0
 3 R_1M3TDihZzq9zDgX     2       1 att4:level1 att7:level2 att3:level2 att1:level2 att2:l… att5… att6…        0                NA    NA
 4 R_1M3TDihZzq9zDgX     2       2 att4:level2 att7:level1 att3:level4 att1:level3 att2:l… att5… att6…        1                NA    NA
 5 R_1M3TDihZzq9zDgX     3       1 att4:level2 att7:level2 att3:level2 att1:level2 att2:l… att5… att6…        0                NA    NA
 6 R_1M3TDihZzq9zDgX     3       2 att4:level2 att7:level2 att3:level2 att1:level1 att2:l… att5… att6…        1                NA    NA
 7 R_1M3TDihZzq9zDgX     4       1 att4:level1 att7:level1 att3:level3 att1:level1 att2:l… att5… att6…        1                NA    NA
 8 R_1M3TDihZzq9zDgX     4       2 att4:level2 att7:level2 att3:level2 att1:level1 att2:l… att5… att6…        0                NA    NA
 9 R_1M3TDihZzq9zDgX     5       1 att4:level1 att7:level1 att3:level3 att1:level2 att2:l… att5… att6…        0                NA    NA
10 R_1M3TDihZzq9zDgX     5       2 att4:level2 att7:level1 att3:level1 att1:level3 att2:l… att5… att6…        1                NA    NA
# ℹ 6,390 more rows
# ℹ Use `print(n = ...)` to see more rows

But once we change .fill = FALSE to .fill = TRUE, it will be the following. See the change in the last column:

# A tibble: 6,400 × 13
   id                 task profile att4        att7        att3        att1        att2    att5  att6  selected selected_repeated agree
   <chr>             <dbl>   <dbl> <fct>       <fct>       <fct>       <fct>       <fct>   <fct> <fct>    <dbl>             <dbl> <dbl>
 1 R_00zYHdY1te1Qlrz     1       1 att4:level2 att7:level1 att3:level1 att1:level2 att2:l… att5… att6…        1                 1     1
 2 R_00zYHdY1te1Qlrz     1       2 att4:level1 att7:level2 att3:level1 att1:level2 att2:l… att5… att6…        0                 0     1
 3 R_00zYHdY1te1Qlrz     2       1 att4:level2 att7:level2 att3:level3 att1:level2 att2:l… att5… att6…        1                NA     1
 4 R_00zYHdY1te1Qlrz     2       2 att4:level2 att7:level2 att3:level4 att1:level3 att2:l… att5… att6…        0                NA     1
 5 R_00zYHdY1te1Qlrz     3       1 att4:level1 att7:level2 att3:level4 att1:level3 att2:l… att5… att6…        1                NA     1
 6 R_00zYHdY1te1Qlrz     3       2 att4:level2 att7:level2 att3:level3 att1:level2 att2:l… att5… att6…        0                NA     1
 7 R_00zYHdY1te1Qlrz     4       1 att4:level1 att7:level1 att3:level1 att1:level1 att2:l… att5… att6…        0                NA     1
 8 R_00zYHdY1te1Qlrz     4       2 att4:level2 att7:level1 att3:level1 att1:level1 att2:l… att5… att6…        1                NA     1
 9 R_00zYHdY1te1Qlrz     5       1 att4:level1 att7:level1 att3:level2 att1:level1 att2:l… att5… att6…        1                NA     1
10 R_00zYHdY1te1Qlrz     5       2 att4:level1 att7:level1 att3:level4 att1:level2 att2:l… att5… att6…        0                NA     1
# ℹ 6,390 more rows
# ℹ Use `print(n = ...)` to see more rows

The explanation I added to the roxygen documentation is the following:

A logical vector: TRUE if you want to use information about whether a respondent chose the same profile for the repeated task and "fill" (using the 'tidyr' package) missing values for the non-repeated tasks, FALSE (otherwise). If the number of respondents is small, if the number of specific profile pairs of your interest is small, and/or if the number of specific respondent subgroups you want to study is small, it is worth changing this option to TRUE. But please note that '.fill = TRUE' is based on an assumption that IRR is independent of information contained in conjoint tables. Although our empirical tests suggest the validity of this assumption, if you are unsure about it, it is better to use the default value (FALSE).

If this looks good, please close this issue with your comment (e.g., "OK").

yhoriuchi commented 1 year ago

Also, see the new section (2.3) on the following page: https://yhoriuchi.github.io/projoint/articles/02-wrangle.html

thegaryking commented 1 year ago

Ok

yhoriuchi commented 1 year ago

Great. Close it now!