jeromyanglim / rmarkdown-rmeetup-2012

Reproducible analysis with knitr, R Markdown, and RStudio: Slides and example R Markdown files from the presentation
http://jeromyanglim.blogspot.com
26 stars 38 forks source link

An argument for not using reproducible data analysis tools like knitr, Sweave, etc.? #12

Closed jeromyanglim closed 12 years ago

jeromyanglim commented 12 years ago

Clearly most researchers don't anlayse their data with reproducible data analysis tools like knitr and Sweave.

For practical purposes I operationalise reproducible analysis as:

knitr or sweave with R and LaTeX and a build script such as a makefile shared as a self-contained archive file is one way of satisfying the above criteria.

jeromyanglim commented 12 years ago

Journals make it difficult

Lack of knowledge of how to perform reproducible research

It creates more work

There are no incentives

There are a few exceptions:

I also feel that it is not enough to simply share a repository. It's important to make the repository user friendly. User friendly could mean:

Deprivation of future papers

Fear of making it too easy for the competition

I see science as a collaborative process. One of the major benefits of reproducible research is that it helps others see exactly how to analyse research data of a given sort.

However, it is possible that some researchers might see this as a negative thing as they seek to be a dominant figure in a particular area.

Some analysis software makes automation difficult or impossible

Naturally, this raises the question of why anyone would use "un-automatable" software. However,

Fear of a mistake being publicly identified

There is a wide spectrum of data analytic misconduct. If we take a legal perspective, we can think of different kinds of intentions (intentional, reckless, negligent) and consequences (how consequential was it to the paper's findings, etc.).

I have heard advocates of open source software state that one reason why open source software is better than proprietary software is because such software is on display to the community. A similar process would possibly operate in a reproducible data analysis context. Researchers would be more inclined to adopt workflows and procedures that keep their analyses clean and tidy. They would be more likely to incorporate quality control procedures that check for possible errors.

It would be interesting to see how journal articles deal with potential increases in errata that might emerge. At present while journal articles permit the incorporation of errata, it generally seemed to me to be a fairly big deal. In contrast, software is often framed as a work under development where bugs are identified and gradually fixed. Admittedly in some respects, journal articles are more static in their scope and application than are

Ethical concerns related to data sharing

Compliance with ethics committees

Limitations imposed by collaboration

Copyright restrictions

In some instances, sharing various algorithms or meta data may be prohibited by copyright restrictions.