jacobtnyoung / reproducible-research

0 stars 0 forks source link

Reproducible Research using Markdown and GitHub

This page offers a brief introduction to doing reproducible research using RMarkdown and Github. Reproducible research is part of a larger Open Science movement to develop transparency, reproducibility, replication, and so on in science.



Reproducible Research


Let's start with an example. Take a look at the following image:


What is this? The image above shows the workflow for a project I completed for a course. It is a project workflow map. What do maps do? They show us where things are and how you get to those things. The figure shows several key features:

What is the route of this map? The map shows us how to start with a raw data file and work all the way through to a project report and a project presentation.

All of the elements of reproducible research are shown in the figure above. Now, let's think about some bigger questions and look at these pieces in more detail...


What is reproducible research?


Why is it important to do reproducible research?


How do we do it?

RMarkdown and GitHub!



What is RMarkdown?

RMarkdown is a dynamic document format that combines the power of R programming language and Markdown syntax. It allows you to integrate code, text, and visualizations into a single document. RMarkdown documents can be easily converted to various output formats, such as PDF, HTML, Word, and more.


An example...

Take a look at the document called "reproducible-manuscript.pdf".

Cool right? This entire document (i.e. text, analysis, output) is all generated in a single RMarkdown file (we will check it out shortly).


Features of RMarkdown:


Now, let's take a look at the RMarkdown file that created the manuscript above: "reproducible-manuscript.Rmd".


Now what?

First thing, check out this tutorial on how to use RMarkdown in RStudio. Got it? Good.

Ok, so we have everything documented, now what do we do? Ideally, I could store it online, track the changes that I make, make it available to others to replicate the code, etc.

GitHub does that!



Working with GitHub

GitHub is a web-based platform for software development and collaboration. It primarily facilitates version control, project management, and team collaboration for software projects.

If you are saying "I don't develop software, I am done with this page", hold on!


Why use GitHub?


So, for all these reasons (and more!), GitHub is an excellent platform for conducting reproducible research.



Some Examples...

Still not convinced? Let me show you a few examples with different scenarios.

Open Access Data-So you found some boss data online and what to build a sweet workflow. Awesome. I am glad you are endorsing the ways of open science! Here is an approach you could follow:

Restricted Access Data-A lot of times we can't make data available on a repository. That is no excuse to practiced closed science! If users are not allowed to access the data you use for your project, then you still can show your work: what variables you coded, what changes you made, how you ran that model, where the table came from, and so on. Here are two examples of reproducible workflows using restricted access data:


Summary

Well, that should get you going. If you have made it this far, then I am happy with your dedication to learning open science practices. Good luck out there!




Notice an error on this page? Please report it as an issue so I can fix it