In this hands-on tutorial, you will learn the fundamentals of analyzing massive datasets with real-world examples on actual powerful machines on a public cloud -- starting from how the data is stored and read, to how it is processed and visualized. You will understand how large-scale analysis differs from local workflows, the unique challenges associated with scale, and some best practices to work productively with your data.
You can use Nebari (JupterHub) hosted at scipy.quansight.dev to follow along with this tutorial.
Follow this participant's guide to register & sign-in (re-register if you used for a different tutorial), select the Medium Instance in the Server Options, and click on the "Data of an Unusual Size" card in the JupyterLab launcher to clone the materials.
In the tutorials/big-data-tutorial
folder that's created with all material, navigate to 00-introduction.ipynb
.
The environment for this tutorial is scipy-scipy-data-of-unusual-size
, and it is automatically selected for you. :)
You can check out the tags for previous versions of this tutorial.
This repository is covered by the Nebari Code of Conduct, and is under BSD 3-Clause license.