valdanchev / reproducible-data-science-python

Reproducible Data Science with Python
Other
19 stars 8 forks source link

Reproducible Data Science with Python

This repository contains the open learning resource Reproducible Data Science with Python in the form of Python Jupyter notebooks.

Publication DOI
Releases GitHub release (latest by date)
License License: CC BY-SA 4.0

Description

The open learning resource uses real-world social data sets related to the COVID-19 pandemic to provide an accessible introduction to open, reproducible, and ethical data analysis using hands-on Python coding, modern open-source computational tools, and data science techniques. Topics include reproducible workflows, data wrangling, exploratory data analysis, data visualisation, pattern discovery (e.g., clustering), prediction and machine learning, causal inference, and network analysis.

How to use the learning resource?

You can read the textbook on the dedicated website. In addition, you can view each individual notebook on GitHub by clicking on the respective GitHub button below.

To interactively work with the code, you can access the interactive versions of the Jupyter notebooks via the free cloud services MyBinder and Colab. Both services allow you to interactively modify and run the notebooks from your browser.

By clicking on a Binder button below, you will launch an interactive version of the Jupyter notebook, with the following capabilities:

By clicking on a Open In Colab button below, you will open a Jupyter notebook in Colab, with the following capabilities:

Textbook chapter View on GitHub Launch on MyBinder.org Open in Colab
About the textbook GitHub Binder Open In Colab
End-to-End Data Science Project GitHub Binder Open In Colab
Python Data Science on the Cloud GitHub Binder Open In Colab
Open Reproducible Data Science Workflow GitHub Binder Open In Colab
Data Design and Data Wrangling GitHub Binder Open In Colab
Data Exploration and Data Visualisation GitHub Binder Open In Colab
Pattern Discovery using Unsupervised Learning GitHub Binder Open In Colab
Prediction using Supervised Learning GitHub Binder Open In Colab
What Causes What? Introduction to Causal inference GitHub Binder Open In Colab
Network Analysis GitHub Binder Open In Colab
Data Ethics GitHub Binder Open In Colab

NOTE

The notebooks Prediction using Supervised Learning and What Causes What? Introduction to Causal inference require access to safeguarded data which, once obtained, needs to be stored securely on your Google Drive and loaded in your private Colab notebooks.



Installing dependencies

To enable computational reproducibility and minimise errors due to updates of Python libraries, you may need to install the dependencies of the resource listed in the requirements.txt file in your Colab notebook (dependencies are automatically preinstalled in Binder). To install dependencies, you can execute the following code at the top code cell of your active notebook:

!pip install -r https://raw.githubusercontent.com/valdanchev/reproducible-data-science-python/master/requirements.txt

Contributing to the resource

Contributions to the learning resource are welcome. Contributions can be made through creating an issue or a pull request.

License

License: CC BY-SA 4.0

Reproducible Data Science with Python by Valentin Danchev is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.