markhwhiteii / bwsTools

Tools for Case 1 Best-Worst Scaling (MaxDiff) Designs
https://osf.io/wb4c3/
Other
13 stars 2 forks source link

e_bayescoring not working-- Consistently getting "-1 and 1 must appear exactly once in every id-block combination." and "Each pairwise comparison between items must occur for every ID" #47

Closed JacobElder closed 5 months ago

JacobElder commented 6 months ago

Hi,

I am trying to use the bwstools package but every time I attempt to use e_bayescoring, I encounter two errors.

"-1 and 1 must appear exactly once in every id-block combination." and "Each pairwise comparison between items must occur for every ID"

I have attempted exploring the underlying code for why this error keeps occurring but I cannot identify the reason. The data was collected in Qualtrics and the merge worked effectively. I cannot determine why e_bayescoring will not work.

Reproducible example below:

structure(list(letters = c("E", "F", "I", "G", "F", "D", "H", "A", "G", "J", "D", "E", "C", "H", "F", "J", "J", "I", "A", "H", "I", "E", "C", "B", "D", "C", "B", "F", "B", "G", "E", "D", "I", "B", "J", "D", "B", "C", "G", "F", "H", "D", "E", "C", "A", "F", "D", "I", "G", "E", "C", "H", "F", "I", "H", "G", "C", "H", "A", "J", "J", "A", "I", "B", "A", "G", "B", "J", "B", "H", "I", "D", "J", "D", "H", "G", "E", "F", "A", "C", "G", "J", "D", "E", "F", "I", "J", "A", "H", "C", "E", "B", "C", "E", "F", "I", "G", "B", "A", "J", "B", "A", "C", "G", "D", "F", "G", "E", "I", "J", "F", "H", "J", "H", "B", "D", "C", "E", "D", "I", "H", "D", "I", "B", "A", "C", "J", "F", "G", "D", "C", "F", "D", "F", "J", "B", "B", "C", "A", "G", "C", "G", "H", "E", "I", "A", "G", "H", "J", "E", "D", "I", "F", "I", "E", "C", "E", "J", "B", "A"), ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), resp = c(1, -1, 0, 0, 0, -1, 0, 1, 0, 1, -1, 0, 1, 0, -1, 0, 1, 0, 0, -1, -1, 0, 0, 1, 0, 0, 1, -1, 1, 0, 0, -1, 1, -1, 0, 0, 0, 1, -1, 0, 1, 0, 0, -1, 0, 1, -1, 0, 0, 1, -1, 0, 0, 1, 0, -1, -1, 0, 1, 0, 1, 0, 0, -1, 0, 1, -1, 0, -1, 0, 1, 0, 0, -1, 1, 0, 0, 0, 1, -1, 0, -1, 0, 1, 0, 1, -1, 0, 1, 0, 0, -1, -1, 0, 1, 0, 0, -1, 0, 1, 1, 0, -1, 0, 0, 1, -1, 0, 0, -1, 0, 1, 0, 1, 0, -1, -1, 0, 0, 1, 0, 0, 1, -1, 0, -1, 1, 0, 1, 0, -1, 0, 1, 0, 0, -1, -1, 0, 1, 0, -1, 1, 0, 0, 0, 1, -1, 0, 1, 0, 0, -1, 0, 1, -1, 0, 1, -1, 0, 0), block = c("C1", "C1", "C1", "C1", "C2", "C2", "C2", "C2", "C3", "C3", "C3", "C3", "C4", "C4", "C4", "C4", "C5", "C5", "C5", "C5", "C6", "C6", "C6", "C6", "C7", "C7", "C7", "C7", "C8", "C8", "C8", "C8", "C1", "C1", "C1", "C1", "C2", "C2", "C2", "C2", "C3", "C3", "C3", "C3", "C4", "C4", "C4", "C4", "C5", "C5", "C5", "C5", "C6", "C6", "C6", "C6", "C7", "C7", "C7", "C7", "C8", "C8", "C8", "C8", "C1", "C1", "C1", "C1", "C2", "C2", "C2", "C2", "C3", "C3", "C3", "C3", "C4", "C4", "C4", "C4", "C5", "C5", "C5", "C5", "C6", "C6", "C6", "C6", "C7", "C7", "C7", "C7", "C8", "C8", "C8", "C8", "C1", "C1", "C1", "C1", "C2", "C2", "C2", "C2", "C3", "C3", "C3", "C3", "C4", "C4", "C4", "C4", "C5", "C5", "C5", "C5", "C6", "C6", "C6", "C6", "C7", "C7", "C7", "C7", "C8", "C8", "C8", "C8", "C1", "C1", "C1", "C1", "C2", "C2", "C2", "C2", "C3", "C3", "C3", "C3", "C4", "C4", "C4", "C4", "C5", "C5", "C5", "C5", "C6", "C6", "C6", "C6", "C7", "C7", "C7", "C7", "C8", "C8", "C8", "C8")), row.names = c(NA, -160L), class = c("tbl_df", "tbl", "data.frame"))

e_bayescoring(repro, id="ID", block="block", item="letters", choice="resp")

JacobElder commented 6 months ago

I have tested the function by checking summing within the respondent ID and the blocks, and testing if there are any blocks within participants that don't sum to 0. All sum to 0. I am pretty stumped about the issue here.

temp <- df %>%
    group_by(ID, block) %>%
    summarise(s = sum(resp)) %>%
    ungroup() %>%
    summarise(ss = sum(s))

all(temp$s==0)
JacobElder commented 6 months ago

Reviewing it now and diving into the get_eloresults function that is part of elo_scoring, I am wondering if it is breaking because I have a MaxDiff where there are four options/items per MaxDiff task, so there are more unchosen options than best or worst options. For example, this section of get_eloresults breaks on my data because there are more unchosen items than there are chosen (best, worst) items.

result <- dplyr::bind_rows(
 results,
dplyr::tibble(winner = best, loser = ci[ci != best])

results <- dplyr::bind_rows(results,
dplyr::tibble(winner = neithers, loser = worst)

Is is saying they don't have compatible sizes but that is because my MaxDiff has four items, with two unchosen, one best, and one worst, per task.

markhwhiteii commented 6 months ago

The function underneath is bwsTools:::get_checks(). If I run debugonce(bwsTools::get_checks)) and then pass your data through it, I get this matrix for ID is 1:

       letters
letters A B C D E F G H I J
      A 2 0 0 1 0 1 0 2 1 1
      B 0 3 2 2 2 1 1 0 1 0
      C 0 2 3 1 1 2 0 1 1 1
      D 1 2 1 4 2 2 2 1 0 1
      E 0 2 1 2 4 1 3 0 2 1
      F 1 1 2 2 1 4 1 2 1 1
      G 0 1 0 2 3 1 3 0 1 1
      H 2 0 1 1 0 2 0 3 1 2
      I 1 1 1 0 2 1 1 1 3 1
      J 1 0 1 1 1 1 1 2 1 3

Each pairwise comparison doesn't show up: A and G don't appear together, E and H, etc. So the design is not a balanced incomplete block design.

Same thing with ID of 2, although they saw a different design:

       letters
letters A B C D E F G H I J
      A 3 1 1 1 0 1 0 1 2 2
      B 1 3 1 1 0 1 1 0 2 2
      C 1 1 4 1 2 1 2 3 0 1
      D 1 1 1 3 1 1 0 1 2 1
      E 0 0 2 1 2 0 1 2 0 0
      F 1 1 1 1 0 3 2 1 2 0
      G 0 1 2 0 1 2 3 2 1 0
      H 1 0 3 1 2 1 2 4 1 1
      I 2 2 0 2 0 2 1 1 4 2
      J 2 2 1 1 0 0 0 1 2 3

B and E, C and I, etc., don't appear together.

I would recommend using:

> set.seed(1839)
> eloscoring(
+   repro,
+   id = "ID",
+   block = "block",
+   item = "letters",
+   choice = "resp"
+ )
# A tibble: 50 × 3
      ID letters   elo
   <int> <chr>   <dbl>
 1     1 A       1036.
 2     1 B       1108.
 3     1 C       1039.
 4     1 D        895.
 5     1 E       1035.
 6     1 F        893.
 7     1 G        998.
 8     1 H        960.
 9     1 I        965.
10     1 J       1072.
# ℹ 40 more rows
# ℹ Use `print(n = ...)` to see more rows
Warning messages:
1: In get_checks(data, id, block, item, choice, nonbibd = TRUE) :
  Analyzing non-BIBD data. Each pairwise comparison between
  items does not occur for every id.
2: In get_checks(data, id, block, item, choice, nonbibd = TRUE) :
  Analyzing non-BIBD data. Each pairwise comparison between items 
  does not occur the same amount of times for each id.
> 

It throws a warning, but it doesn't assume BIBD the way that the emprical estimation of the Bayesian MNL does.

JacobElder commented 5 months ago

Hi @markhwhiteii , thanks so much for the response! Earlier when I referred to the get_eloresults() function I actually copied the functions and commented out the bwsTools:::get_checks() function that was resulting in an error because I was trying to isolate the issue.

Ultimately, it seemed that the even bigger issue was that I had MaxDiff data where people did not complete their MaxDiff and was incomplete. So if I dropped all incomplete MaxDiffs I was at least able to run it using the eloscoring() method.

Thanks again so much!