Inferential tests: Simonsohn's psi and F

lhdjung commented 8 months ago

Just to put it on our radar – I think these two are out of scope for recalc, but still relevant to the errorverse. As of now, they are only implemented in SAS.

In 2013, Uri Simonsohn published an article about the Sanna and Smeesters cases (Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone). He developed two simulation-based measures:

The first, called $\Psi$, assesses whether results for the same statistic reported across multiple studies are too similar to each other (p. 1878).
He also used the sum of the modal frequencies, which he calls $F$, to test the null hypothesis of random sampling (p. 1882).

These shouldn't be too hard to implement, but we'd need to make some design decisions first: which package should host inferential tests for error detection, and what should be their output format? I'd prefer tibbles because they invite tidy data.

ianhussey commented 8 months ago

Hard to say for the moment, but certainly some of the correlation table checks that are currently in the RECOVAR repo would be classified as inferential tests on reported results.

LukasWallrich commented 8 months ago

These functions tend to push me over the line towards a separate package for statistical plausibility tests - though I don't have a name yet. recalc and scrutiny should probably both be restricted to tests that flag errors, rather than doubts?

Not sure about a name, though - something around fishy smell?

lhdjung commented 8 months ago

I think inferential tests are enough of a distinguishing feature to justify a new package. They would still look for "errors" in the broad sense of things not being right, which includes any reasons for doubt.

People who conduct error checking – all the big names – tend to stress that intent should not be assumed, for various reasons. Inferential plausibility tests may not necessarily allow analysts to conclude why something is not as it should be. (Take it from the man himself.) In this way, they are similar to more basic techniques such as GRIM.

Regarding a name, I thought about "inferror": inferential tests for error detection.

ianhussey / ERROR

Inferential tests: Simonsohn's psi and F #7