jsheunis commented 4 years ago

Designing a register of replicability estimates for published research findings

By Cooper Smout, Institute for Globally Distributed Open Research and Education

Theme: Past, Present and Future of Open Science
Format: Emergent session

Abstract

The research community possesses a great deal of information about the replicability of published findings that is currently underutilised. For example, prediction markets can identify findings that are likely to replicate (Dreber et al., 2015), authors can lose confidence in their own findings (Loss-of-Confidence Project), and many replication attempts go unpublished (file drawer problem). This information could increase research efficiency by providing an early signal about study reliability. In this unconference, we will design a database of replicability estimates for published research findings. For example, a researcher might log on (using ORCID) and file a replicability estimate (0-100% with confidence intervals) about a published finding, along with a short justification. Multiple estimates could be aggregated using Bayesian updating. Researchers could then search the database (using DOIs) to find out whether a published finding is likely to replicate, potentially saving months of work. Long-term, such a register might also help to steer article evaluation away from impact and toward reliability instead.

Useful Links

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3473231/ https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000117

Tagging @CooperSmout

CooperSmout commented 4 years ago

Note: I'll be updating this over the weekend with a slightly different pitch, focussed on numerical open evaluation of neuroscience papers (Kriegeskorte, 2012)

mnarayan commented 4 years ago

Really cool topic @CooperSmout. I am curious to know if we have some agreement on what an individual considers a replication. For instance since p-value estimates are themselves a random variable, there will be some randomness in whether one study has a p-value less than .05 and another slightly larger than .05. One might argue that there ought to be some acceptable amount of sampling variability here, without resulting in a conclusion that study did not replicate.

Would you consider creating some sort of estimate of replication/repeatability probability associated with a p-value? (E.g. see probability of a p-value < .05 result being repeated in a second study below). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3370685/figure/F3/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3370685/

CooperSmout commented 4 years ago

Glad to hear you're interested @mnarayan! I've participated in a few RepliCATS workshops, which asks 'experts' (mostly psychology researchers) to rate probability of replication success as a %. They define a replication as finding a significant effect using a conceptually similar design (which itself is a basket of worms!) that is powered at 90% to detect the original effect. Under that definition p-values are not relevant except so far as they define the cut-off for significance. But great point -- completely agree it might not be fair to say that a p=.06 result did not 'replicate' the original finding (e.g. p=.04), simply because of sampling error. This problem is not specific to this project, of course, but a problem with frequentist statistics in general. I much prefer Bayesian statistics, where we can quantify the amount of evidence for/against the hypothesis.

I'm now wondering if replicability estimates could be expressed in a Bayesian manner? This could make it easier to integrate replication information as it comes to light down the track, e.g. if someone conducts a replication study which produces the same effect, another researcher could rate the 'similarity' between the two studies and that information could feed into the replicability register using Bayesian updating. My only hesitation with this is that people might find it confusing, as we tend to be more familiar with frequentist statistics than bayesian. Will have to think on this some more, but would love to hear your thoughts!

mnarayan commented 4 years ago

@CooperSmout Thanks for explaining the replication success concept. I can't recall off the top of my head but the discussion section of the paper I attached does refer to one or two methods for defining a reproducibility or repeatability probability under a Bayesian framework. So that might be of interest to you.

CooperSmout commented 4 years ago

Oh great, thanks. Sorry I haven't had a chance to look thru the reference yet. Are you familiar with the 'open evaluation' framework? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3473231/

Basically a PPPR model but also suggests we should rate articles on a number of dimensions. Personally I like this model, and see replicability as one of a number of dimensions that could be rated

ohbm / osr2020

Past, Present and Future of Open Science (Emergent session): Designing a register of replicability estimates for published research findings #81

Designing a register of replicability estimates for published research findings

Abstract

Useful Links