neuropsychology / psycho.R

An R package for experimental psychologists
https://neuropsychology.github.io/psycho.R/
Other
144 stars 32 forks source link

I think the adjusted dprime formula is wrong. #113

Closed ricardoV94 closed 4 years ago

ricardoV94 commented 4 years ago

https://github.com/neuropsychology/psycho.R/blob/611e984742890e698c4e94b6965d917432e98348/R/dprime.R#L77-L79

This potential issue has been raised before here: https://github.com/neuropsychology/neuropsychology.R/issues/10

I think the correct formula is:

hit_rate_adjusted <- (n_hit + 0.5)/(n_hit + n_miss + 1)
fa_rate_adjusted <- (n_fa + 0.5)/(n_fa + n_cr + 1)

The confusion, I think, comes from the wording of the referenced paper (Hautus, 1995, link):

A method that has found much use in log-linear analysis requires the addition of 0.5 to each cell in the two-by-two contingency table that defines the performance of the observer (e.g., Fienberg, 1980; Goodman, 1970; Knoke & Burke, 1980). Row and column totals are increased by one.

What I think the author meant is that if you are using the row or column totals in your formula, then 1 should be add to those (and not in addition to the 0.5 to each cell)

hit_rate_adjusted <- (n_hit + 0.5)/(n_signal_trials + 1)

This I think would be the formula implied by (Stanislaw & Todorov, 1999, link):

In the discussion below, H is used to indicate the hit rate. This rate is found by dividing the number of hits by the total number of signal trials. Similarly, the false-alarm rate, F , is found by dividing the number of false alarms by the total number of noise trials.

A third approach, dubbed loglinear , involves adding 0.5 to both the number of hits and the number of false alarms and adding 1 to both the number of signal trials and the number of noise trials, before calculating the hit and false-alarm rates. This seems to work reasonably well (Hautus, 1995). Advocates of the loglinear approach recommend using it regardless of whether or not extreme rates are obtained.

I tracked down the first reference by Hautus, (Fienberg, 1980, p.64):

Still, ''too many" sampling zeros in the body of a table may create a problem where a marginal table to be fitted in the model contains zero cells. Two basic alternatives are possible: (1) add a small value to every cell in the body of the table, including those with nonzero frequencies. A value of .5 is often suggested (Goodman, 1970: 229). (This is a conservative procedure which will tend to underestimate effect parameters and their significance.) Or (2) arbitrarily define zero divided by zero to be zero (Fienberg, 1977: 109).

Note that they don't mention an extra 1 should be added (this is a mere consequence of adding .5 + .5 to the two cells in a row or column.

I am not totally confident of my interpretation, but I could not find a convincing source for the formula used in the library either...

DominiqueMakowski commented 4 years ago

Hey @ricardoV94,

yes, this is an issue I've been investigating since some time, as I've seen and had conflicting feedback about that formula. Unfortunately, Hautus' paper isn't crystal clear regarding it. But indeed I agree with your conclusion (thanks for the thorough investigation!), I'll update that formula soon :)

DominiqueMakowski commented 4 years ago

https://github.com/neuropsychology/psycho.R/blob/f3b614ca8bfa0b3498f7eed79d64d6e13d57ec8f/R/dprime.R#L79-L81

This has been fixed in the latest version ☺️