This repository contains the open learning resource Reproducible Data Science with Python in the form of Python Jupyter notebooks.
Publication | |
---|---|
Releases | |
License |
The open learning resource uses real-world social data sets related to the COVID-19 pandemic to provide an accessible introduction to open, reproducible, and ethical data analysis using hands-on Python coding, modern open-source computational tools, and data science techniques. Topics include reproducible workflows, data wrangling, exploratory data analysis, data visualisation, pattern discovery (e.g., clustering), prediction and machine learning, causal inference, and network analysis.
You can read the textbook on the dedicated website. In addition, you can view each individual notebook on GitHub by clicking on the respective button below.
To interactively work with the code, you can access the interactive versions of the Jupyter notebooks via the free cloud services MyBinder and Colab. Both services allow you to interactively modify and run the notebooks from your browser.
By clicking on a button below, you will launch an interactive version of the Jupyter notebook, with the following capabilities:
By clicking on a button below, you will open a Jupyter notebook in Colab, with the following capabilities:
NOTE
The notebooks Prediction using Supervised Learning and What Causes What? Introduction to Causal inference require access to safeguarded data which, once obtained, needs to be stored securely on your Google Drive and loaded in your private Colab notebooks.
To enable computational reproducibility and minimise errors due to updates of Python libraries, you may need to install the dependencies of the resource listed in the requirements.txt
file in your Colab notebook (dependencies are automatically preinstalled in Binder). To install dependencies, you can execute the following code at the top code cell of your active notebook:
!pip install -r https://raw.githubusercontent.com/valdanchev/reproducible-data-science-python/master/requirements.txt
Contributions to the learning resource are welcome. Contributions can be made through creating an issue or a pull request.
Reproducible Data Science with Python by Valentin Danchev is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.