hackalog / easydata

A flexible template for doing reproducible data science in Python.
MIT License
108 stars 22 forks source link

Build Status CircleCI Coverage Status Documentation Status

EasyData

A python framework and git template for data scientists, teams, and workshop organizers aimed at making your data science reproducible

For most of us, data science is 5% science, 60% data cleaning, and 35% IT hell. Easydata focuses the 95% by helping you deliver

In other words, Easydata is a template, library, and workflow that lets you get up and running with your data science analysis, quickly and reproducibly.

What is Easydata?

Easydata is a framework for building custom data science git repos that provides:

Easydata is not

Requirements to use this framework:

once you've installed anaconda, you can install the remaining requirements (including cookiecutter) by doing:

conda create --name easydata python=3
conda activate easydata
python -m pip install -f requirements.txt

To start a new project, run:


cookiecutter https://github.com/hackalog/easydata

To find out more


A good place to start is with reproducible environments. We have a tutorial here: Getting Started with EasyData Environments.

The next place to look is in the customized documentation that is in any EasyData created repo. It is customized to the settings that you put in your template. These are reference documents that can be found under references/easydata that are customized to your repo that cover:

Furthermore, see:

The resulting directory structure


The directory structure of your new project looks like this:

Installing development requirements

The first time:

make create_environment
git init
git add .
git commit -m "initial import"
git branch easydata   # tag for future easydata upgrades

Subsequent updates:

make update_environment

In case you need to delete the environment later:

conda deactivate
make delete_environment

Credits and Thanks