A python framework and git template for data scientists, teams, and workshop organizers aimed at making your data science reproducible
For most of us, data science is 5% science, 60% data cleaning, and 35% IT hell. Easydata focuses the 95% by helping you deliver
In other words, Easydata is a template, library, and workflow that lets you get up and running with your data science analysis, quickly and reproducibly.
Easydata is a framework for building custom data science git repos that provides:
Easydata is not
once you've installed anaconda, you can install the remaining requirements (including cookiecutter) by doing:
conda create --name easydata python=3
conda activate easydata
python -m pip install -f requirements.txt
cookiecutter https://github.com/hackalog/easydata
A good place to start is with reproducible environments. We have a tutorial here: Getting Started with EasyData Environments.
The next place to look is in the customized documentation that is in any EasyData created repo. It is customized to the settings that you put in your template. These are reference documents that can be found under references/easydata
that are customized to your repo that cover:
Furthermore, see:
references/easydata
docs (including a git
tutorial)The directory structure of your new project looks like this:
LICENSE
Makefile
make
for a list of valid commandsMakefile.include
Makefile
.Makefile.env
Makefile
.README.md
catalog
catalog/config.ini
data
data/raw
data/interim
data/interim/cache
data/processed
docs
docs/Makefile
: Makefile for generating HTML/Latex/other formats from Sphinx-format documentation.notebooks
-
delimited description,
e.g. 1.0-jqp-initial-data-exploration
.reference
reference/easydata
: Easydata framework and workflow documentation.reference/templates
: Templates and code snippets for Jupyterreference/dataset
: resources related to datasets; e.g. dataset creation notebooks and scriptsreports
reports/figures
environment.yml
environment.(platform).lock.1yml
environment.yml
setup.py
MODULE_NAME
into a
pip-installable python module (pip install -e .
) so it can be
imported in python codeMODULE_NAME
MODULE_NAME/__init__.py
MODULE_NAME/data
MODULE_NAME/analysis
The first time:
make create_environment
git init
git add .
git commit -m "initial import"
git branch easydata # tag for future easydata upgrades
Subsequent updates:
make update_environment
In case you need to delete the environment later:
conda deactivate
make delete_environment