a tool for tracking, labeling and syncing changes in line-based documents
full control of what and when you sync with collaborators
minimization and easy handling of conflicts
no redundant multiplication of versions
great tools for setting deadlines, raising and solving issues, and distributing tasks
Note: Because GitHub is intended for programmers, it is not always the best way to share things such as large binary data files, word documents and images (although all of this is possible). For files that are unlikely to change, it is sometimes better to store them on Dropbox and then source in the data from R.
.git
keeps a record of all the metadata git needs in order to workYou have three options:
Today we will clone this repo
+
sign in the top cornercollaborative_workflow
Changes:
any time you make a change to a line of code and save it, say in R
or LaTeX, GH will make a note of it
however, making changes doesn't yet do anything for our collaborators
let's practice making some changes:
README.md
under a new folder in 00_Archive/example_readmes/your_name
Commits:
every line of code can be separately committed: this means that you've made the change, and now you want to share it with others
labeling commits is extremely important and helpful for collaborators
let's take a look at a couple of the commits I've made on the GitHub website
now let's commit some changes to the README.md files
golden rule: before trying to push, you should always pull
pulling brings in any changes that your collaborators have committed and pushed to the online repo
this is the best way for GH to avoid conflicts (i.e. if you wrote on the same line)
to pull in the desktop app, either press cmd/ctrl + shift + p
or click Repository > Pull
check out the 'history' tab
OK so let's push our changes
go to the desktop app, either press cmd/ctrl + p
, click Repository > Push
or click sync
(which pulls then pushes)
let's go check out your fancy new README.md files
issues can be opened and closed on the online repo
they are a super-convenient way to set goals and track progress
you can set milestones with dates
you can assign issues to people
you can tag issues
today we have been collaborating by contributing to the 'master' branch
another development style is to 'fork off' (heh) from the master branch, work on some stuff independently, and then merge it back in
this is more important for developers
Assuming you are working in R
and LaTeX
(as you should be), there are a few principles to adopt:
__archive
folder in the top for throwing old versions of things in (even with versifying this can be helpful)YYYY_MM_DD_
at the beginning: this ensures that versions are always ordered chronologicallyA good way to structure a directory:
00_Archive
- A bunch of old scrap code, notes, data, etc.
01_Data
Raw_data
- The data in its raw .csv
format, with no changes made to itClean_data
- The data in its cleaned form, once a specific R script has cleaned it up02_Analysis
- Typically contains all of the R scripts in one go, although you may have subfolders (i.e. for spatial analysis etc.). The scripts should generally do one thing each, and all be run by one central "main script", whose purpose is to control a few key parameters and source the other ones in. More on this below
03_Paper
- This should contain the main .tex
files, the .bib
, .sty
, and other files that .tex
wants direct access to
figures
- We will output directly from R into this folder, then source the images in from heretables
- Each table will be output from R in a tabular environment, and sourced into the main .tex
filepresentations
- It makes sense to put beamer presentations etc. into here, because that way you can just copy and paste the image and table code from the paper, and just add ../
to the file paths (more below)04_Presentations
or 04_Literature
- Other folders that probably won't go into the replication archive but are useful for the project
A huge issue in collaborative workflows are file path conflicts
It is possible to avoid them entirely:
.Rproj
file: this automatically sets the working directory to the local project folder on the computer of the user, so all scripts can be run with reference to the project folder. This is also true for the replication archive: make sure to include the .Rproj
file in it."..."
and press tab
to get suggestions for file paths. ../
to navigate up a level. setwd()
, you shouldn't have to..tex
files always treat their folder as the directory for sourcing stuff like figures, .bib
files, other .tex
files, etc.R can run other scripts from a single script.
There are three huge advantages to writing R scripts in a very modular way and running them from one main script:
.R
script called 01_main_analysis.R
, call it 04_main_analysis_interacted.R
, and then add thatrm(stuff,you,dont,need)
at the end.Lets look at the examples in 02_Analysis
.
sink()
them into .tex
files from R
using stargazer
\input{}
them into your LaTeX paperknitr
.tex
documentknitr
, but not always efficient Let's look at an example
.pdf
and .png
files in much the same waypdf()
in R
and \includegraphics[]{}
in LaTeXLet's look at an example
build the .html
file in .Rmd
upload it to your website using SFTP!!!
a host of amazing tools for building websites in Markdown now exist
here is one example built using Jekyll