ttimbers/data_analysis_pipeline_eg-archive

Building a Data Analysis pipeline tutorial

This example data analysis project analyzes the word count for all words in 4 novels. It reports the top 10 most occurring words in each book in a report.

Usage:

There are two suggested ways to run this analysis:

1. Using Docker

note - the instructions in this section also depends on running this in a unix shell, if you are using Windows Command Prompt, replace $(pwd) with PATH_ON_YOUR_COMPUTER.

Install Docker
Download/clone this repository
Use the command line to navigate to the root of this downloaded/cloned repo
Type the following:

docker run --rm -v $(pwd):/home/rstudio/data_analysis_eg ttimbers/data_analysis_pipeline_eg make -C /home/rstudio/data_analysis_eg all

2. After installing all dependencies (does not depend on Docker)

Clone this repo, and using the command line, navigate to the root of this project.
To run the analysis, type the following commands:

make all

To reset/undo the analysis, type the following commands:

make clean

Depenedencies

R & R libraries:
- rmarkdown
- knitr
- cowsay
Python & Python libraries:
- matplotlib
- numpy
- pandas
- sys
- collections
- wordcount
Bash Unix shell
Make

The tutorials for this example can be found here:

https://github.ubc.ca/MDS-2018-19/DSCI_522_dsci-workflows_students

ttimbers / data_analysis_pipeline_eg-archive

readme

Building a Data Analysis pipeline tutorial

Usage:

1. Using Docker

2. After installing all dependencies (does not depend on Docker)

Depenedencies