jaspercooper / collaborative_workflow

Tools for researching and publishing with collaborators.
4 stars 5 forks source link

Collaborative Workflows: Columbia methods workshop on tools for researching and publishing with collaborators

GitHub

Why it is good

Note: Because GitHub is intended for programmers, it is not always the best way to share things such as large binary data files, word documents and images (although all of this is possible). For files that are unlikely to change, it is sometimes better to store them on Dropbox and then source in the data from R.

What a repository is

How to add a repo in the desktop app

You have three options:

  1. Create: you make a new local repo in a folder on your computer, and it gets published to your online profile
  2. Add: you choose a pre-existing folder on your computer and it is used as a local repo then published online
  3. Clone: you create or get added to an online repo, and this gets cloned to your computer as a local repo

Today we will clone this repo

Making and committing changes

Changes:

Commits:

Pulling

Pushing

Issues and Milestones

Branches

Doing Research Together

The project directory

Assuming you are working in R and LaTeX (as you should be), there are a few principles to adopt:

A good way to structure a directory:

Avoiding working directory issues

Sourcing R scripts

R can run other scripts from a single script.

There are three huge advantages to writing R scripts in a very modular way and running them from one main script:

  1. It makes it much easier for collaborators to see what is being done where. So, for example, if I want to re-do the main analysis tables with an extra interaction throughout the models, I can just copy the .R script called 01_main_analysis.R, call it 04_main_analysis_interacted.R, and then add that
  2. It keeps the environment clear. You can make R scripts "tidy up" after themselves by including rm(stuff,you,dont,need) at the end.
  3. It makes it easier to jump into the workflow at any point. Good practice is to have at least one script that cleans and outputs data, and another script that loads the helper functions you will be using. Once these are loaded you can usually skip the other analysis steps and just run new analyses.

Lets look at the examples in 02_Analysis.

Sinking and inputting LaTeX tables

Let's look at an example

Sinking and inputting figures

Let's look at an example

Quick web-publishing

Pushing .html files to your website

GH Pages, Travis, etc.