ledwindra / replication-code-economics

Track publicly available replication codes/supplemental materials in economics (currently top 10) journals
MIT License
17 stars 8 forks source link
economics replication-code reproducibility supplementary-data supplementary-material

About

In late March 2021, a friend of mine sent me a paper titled The influence of hidden researcher decisions in applied microeconomics. Then I read the author's thread on Twitter. That leads me to the following question:

Is making replication code more of a norm or an exception to (micro/empirical)economists?

It may be true that there are public resources out there, but they may be scattered and not centralized in a place where communities can easily look up to, such as GitHub, where for other programming languages we can see on GitHub topics such as Awesome List. It's not surprising that economists are not used to sharing stuffs on this place, but fortunately it looks like the numbers are growing recently.

This repository is automatically updated at 12.00 AM UTC every day to nowcast this trend.

Data source

Currently there are several sources that I'm using.

GitHub API

The only metric that I use is total numbers of public repositories that use Stata and have "replication code" in the keywords (not case sensitive). Also, it doesn't take into account the field of study in the replication codes. It is possible that people who use Stata also come from fields of study other than economics. Plot example as follows:

replication-code-stata

AEA Deposit on ICPSR

American Economic Association deposits, where I scrape all of DOIs of each journal (nine in total). Then I compare the proportions of papers that have deposits in Open Inter-university Consortium for Political and Social Research (ICPSR) to total papers published in AEA. The idea is to see the trends of replication over time so I don't have to hard-code the search process which I may overlook. Plot example as follows:

replication-code-stata

Work in Progress: Top 10 Econ Journals According to IDEAS/RePEc

See my project list here. The idea is to parse metadata from Crossref for each paper DOI in top ten economics journals according IDEAS/RePEc (full list here). The raw data can be checked on data/crossref/[JOURNAL-CODE]. For example:

Has replication

After getting raw data from Crossref, I crosscheck every DOI to each of the corresponding journal whether it has any replication code/supplementary material or not. The datasets are under data/has-replication directory. They are in .csv format. See has_replication column, where True indicates that the corresponding DOI potentially has replication and False otherwise.

PS

There's a potential irony here when I'm creating a repository about reproducibility but my progams won't work in either your machine or on GitHub Actions. It's because the scripts depend on public APIs (e.g. GitHub and Crossref). In addition, some journal websites may undergo a redesign, which makes the functionality of the scripts won't work as intended. Hence I save the raw data too.

In any case, don't hesitate to reach me out or submit issues here.