ryentes / careless

Other
25 stars 8 forks source link

Psychometric synonyms arbitrary assignment can be problematic. #16

Closed ryentes closed 3 years ago

ryentes commented 5 years ago

From Amanda Young Via email:

So I'm checking for careless responding in my T1 dissertation data, and when I ran the psychsyn function from Richard's package, I got several NAs because the values in one of the columns happened to be all the same (no variance; which would be dividing by 0). When I was trying to find a solution, I thought about just switching 1 set of values from the x column to the y column. Pyschsyn is calculated by finding item pairs that correlate highly (e.g. >.60) and then doing an interperson correlation based on those items. The way it seems like this works is putting 1 item from each pair into either an x or y column at random, then correlating the two columns. If that is the case I should be able to switch, for example, item M46 to the y vector and P36 to the x vector... (see attached example of 1 person with my pairs correlated >.6) but if I just switch random values across columns, it changes the resulting correlation...

ryentes commented 5 years ago

Response by @awmeade via email

All correct. The choice of which one goes in which column is arbitrary and you could get different values by changing things around.
Probably a better version would try all possible combinations and then average the values, but that's a PITA. It's a lot like how a split-half correlation works. You'll get different values with different splits and people like coefficient alpha because it's computation like the mean of all possible split halfs.

Richard, one idea for the R code would be in cases where an NA is returned, have the program attempt to run the loop again? I'd suggest a do-while NA type of loop, but you'd probably also want a counter in there to exit after x attempts to prevent infinite loops .

ryentes commented 5 years ago

My reply by email:

I think this could work. With the size of the data we work with the end user probably wouldn't even notice the extra computation. I think there's a lot to mess with in the psych syn/ant space. It currently doesn't limit each item to one item pair, which is recommended somewhere (maybe Curran, 2016).

franciscowilhelm commented 5 years ago

My reply by email:

I think this could work. With the size of the data we work with the end user probably wouldn't even notice the extra computation. I think there's a lot to mess with in the psych syn/ant space. It currently doesn't limit each item to one item pair, which is recommended somewhere (maybe Curran, 2016).

Regarding limiting each item to be used only once, just looked that up: "The maximum N for any given set of items is half the total number of items, assuming all items are used to create pairs, without replacement, in this process. That said, it is not recommended that all items be used, or for items to be used more than once" (Curran, 2016, p. 11, emphasis added).

I made a quick version that does a resampling of the X and Y position k times. But that is indeed a PITA computation-wise, the function becomes very slow even for k = 10. I suppose its best to implement the idea by @awmeade to loop while function returns NA, with a max. of 10 times. Future work could be to implement the advice to use each item only once. The question is, for which pair would one use it? For the highest (lowest) correlation in psychsyn (psychant)?

ryentes commented 4 years ago

Fix is in place. Let's clean up the comments and push.