error in e_bayescoring - Githubissues

arainboldt commented 3 years ago

Hi,

Thanks for the package. I'm excited about using it, but I'm running into some errors.

I'm new to R, coming from python, so I'm not sure how to debug.

I'm getting the below error when calling the function:

RRuntimeError: Error in get_checks(data, id, block, item, choice) : 
  -1 and 1 must appear exactly once in every id-block combination. Currently, this is not the case for the ids:
    R_1, R_2, etc..

the first respondent-block unit from the frame I'm passing is:

respondent_id | block_id | statement | value
-- | -- | -- | --
R_1 | 1 | sports | 1
R_1 | 1 | crafts | 0
R_1 | 1 | reading | -1
R_1 | 1 | videogames| 0
R_1 | 1 | work | 0

The above is a complete block from respondent R_1. I've got the added complication that I'm accessing the function via r2py

Any ideas about what's going wrong, or how to debug it?

arainboldt commented 3 years ago

Hey, so I figured out that I don't have a balanced dataset, so it looks like this method isn't viable. Where can I read more about the requirement for having a balanced dataset, where and why it's necessary? I see that it's not required for other methods like walkscoring and prscoring. Why is that?

Also, I'm getting an error with the adjacency matrix on walkscoring and prscoring.

R[write to console]: Error in if (sum(M[r, ]) == 0) M[r, ] <- 1/nrow(M) else M[r, ] <- M[r,  : 
  missing value where TRUE/FALSE needed

does this indicate that there's another formatting error with my data?

Thanks

markhwhiteii commented 3 years ago

Hi! Would it be possible to share with me your data and code so that I could reproduce the error and see what's going on? As for balanced datasets and when to use what functions, see: https://osf.io/xftvq/

arainboldt commented 3 years ago

test_bws_data.csv Sure, here's the data file and the code:

import rpy2.robjects.packages as rpackages
from rpy2 import robjects
try:
    bwstools = rpackages.importr("bwsTools")
except:
    utils = rpackages.importr('utils')
    utils.chooseCRANmirror(ind=1)
    utils.install_packages('bwsTools')
    bwstools = rpackages.importr("bwsTools")

base = rpackages.importr('base')
from rpy2.robjects import pandas2ri
pandas2ri.activate()
robjects.r("options(error=recover)")

def e_bayescoring(data):
    robjects.globalenv['dataframe'] = data
    return bwstools.e_bayescoring(data=base.as_symbol('dataframe'),
                                  **{'id':'respondent_id',
                                     'block':'block_id',
                                     'choice':'value',
                                     'item':'statement'})

def walkscoring(data):
    robjects.globalenv['dataframe'] = data
    return bwstools.walkscoring(data=base.as_symbol('dataframe'),
                                  **{'id':'respondent_id',
                                     'block':'block_id',
                                     'choice':'value',
                                     'item':'statement'})

def eloscoring(data):
    robjects.globalenv['dataframe'] = data
    return bwstools.eloscoring(data=base.as_symbol('dataframe'),
                                  **{'id':'respondent_id',
                                     'block':'block_id',
                                     'choice':'value',
                                     'item':'statement'})
def prscoring(data):
    robjects.globalenv['dataframe'] = data
    return bwstools.prscoring(data=base.as_symbol('dataframe'),
                                  **{'id':'respondent_id',
                                     'block':'block_id',
                                     'choice':'value',
                                     'item':'statement'})

markhwhiteii commented 3 years ago

Yup, looks like everything is OK—it is just that you do not have a balanced design. In brief, a balanced design is when:

every respondent sees the same number of blocks, and each of the blocks are of the same size
each item appears the exact same number of times
each pair of items appears the exact same number of times

Why is this important? Taking it point-by-point:

if certain blocks have more items than other blocks, it can confuse the respondent or tip them off that something is different about that block, making them respond differently
if some items appear more than others, it communicates to the respondent perhaps that this item is more important than others; it also gives us more information about some items than the others
if an item appears with a certain other item more often than others, it can give us invalid results. think of it in a sports metaphor: if an average team plays a bunch of bad teams—just by chance of scheduling—their win/loss percentage will be inflated, looking better than it is. similarly, if a great team happens to get paired against other great teams, their win/loss percentage will be deflated. people often refer to this as a "strength of schedule" problem in sports (which is why I have implemented so-called "tournament scoring" methods from Chess and other competitions for imbalanced designs)

And, most crucially, the theoretical and empirical work shows that difference scores and the empirical Bayes method assume data are coming from balanced designs—because these designs have helpful properties, like design matrices being orthogonal, that helps us simplify the problems. This is why I always recommend a balanced design when possible.

That being said, there are many times where we cannot get balanced designs. In those cases, I suggest using something like eloscoring() to get individual ratings. This paper https://osf.io/xftvq/ explains which functions require and do not require balanced data.

There is a function hidden inside of the package called get_checks(). When I run it on your data, I find that it is imbalanced:

> bwsTools:::get_checks(dat, "respondent_id", "block_id", "statement", "value")
Error in bwsTools:::get_checks(dat, "respondent_id", "block_id", "statement",  : 
  Each pairwise comparison between items must occur for every id

But there are some models that relax this, by allowing an argument called nonbibd = TRUE:

> bwsTools:::get_checks(dat,
+                       "respondent_id",
+                       "block_id",
+                       "statement",
+                       "value",
+                       nonbibd = TRUE)
Warning messages:
1: In bwsTools:::get_checks(dat, "respondent_id", "block_id", "statement",  :
  Analyzing non-BIBD data. Each pairwise comparison between
  items does not occur for every id.
2: In bwsTools:::get_checks(dat, "respondent_id", "block_id", "statement",  :
  Analyzing non-BIBD data. Each pairwise comparison between items 
  does not occur the same amount of times for each id.

These warnings give you detailed reasons why you don't have a balanced design.

In that case, methods that assume a balanced design will throw an error. Those that do not will only throw you those warnings. So I recommend eloscoring() in this context:

set.seed(1839)
res <- eloscoring(dat, "respondent_id", "block_id", "statement", "value")

I get an error, and it is because I am assuming implicitly that items ("statements" in your data) will be characters, not numeric. And I make the "dummy" statements in eloscoring() character (see the Hollis references in the docs for why I do this). So, it tries to join character and numeric columns and errors out:

> res <- eloscoring(dat, "respondent_id", "block_id", "statement", "value")
Error: Can't combine `..1$winner` <character> and `..2$winner` <double>.
Run `rlang::last_error()` to see where the error occurred.

I need to fix this, but in the meantime, it will run if you coerce statement to character first:

library(bwsTools)
library(tidyverse)
dat <- read_csv("test_bws_data.csv") %>% 
  select(respondent_id, block_id, statement, value) %>% 
  mutate(statement = as.character(statement)) # NOTE: COERCES HERE

set.seed(1839)
res <- eloscoring(dat, "respondent_id", "block_id", "statement", "value")

That returns what you need. The downside of not having a balanced design means increased computational complexity in calculating individual scores—a lot of these methods were designed to get aggregate scores. I still am working on ways to increase computational efficiency. Right now, eloscoring() on your data takes about ~1 second per respondent ID, and it is linear, or O(n), in time complexity (data with 10 cases takes 10 seconds, data with 100 cases takes 100 seconds, data with 500 cases takes 500 seconds, etc.).

Balanced designs run remarkably faster by using e_bayescoring() because it follows a closed-form, analytical solution. I may allow users to manually specify nonbibd = TRUE in the future for e_bayescoring(), but I am still working on simulation studies to see how much having an imbalanced design will hurt the estimation of valid individual ratings.

arainboldt commented 3 years ago

Hi Mark,

Thanks a lot for your detailed response and the resources that you've shared. This is my first project with MaxDiff and it's been a bit difficult to understand all of the details of the process. Your resources have been very helpful for me to learn about the method.

I realized that I didn't have a balanced dataset. I didn't realize, however, that it was so crucial to the empirical bayes method. I expected that normalization would have sufficiently mitigated the effect of imbalance.

I have explored using the elo and difference scoring methods, and they both work fine. They're less desirable as they don't incorporate the global scores. Unfortunately, this study has already been run so it's not possible to balance it now.

I was going to request that you expose the nonbibd arg in e_bayescoring, but I understand your reluctance to do so.

I'll make due with this with the data and the methods as is for the time being. Perhaps I'll code the empirical bayes scoring in python so as to deal with the imbalanced data. We'll see.

Thanks again! I really appreciate your prompt responses and help

Andrew

arainboldt commented 3 years ago

Also, given the slow operation of the eloscoring method, and the fact that it's processing each respondent separately, it sounds like a great candidate for parallel operation.

markhwhiteii commented 3 years ago

sounds good, and yeah, the change would be at this line here: https://github.com/markhwhiteii/bwsTools/blob/master/R/e_bayescoring.R#L61 adding an argument of nonbibd = TRUE. I can probably update the package to let you do it, I just don't really know the statistical properties of using it on imbalanced data, so I've held off

markhwhiteii / bwsTools

error in e_bayescoring #41