pydatadelhi / talks

Talks at PyData Delhi Meetups
44 stars 13 forks source link

Scaling Data Science with Dask #133

Open pavithraes opened 2 years ago

pavithraes commented 2 years ago

Abstract (2-3 lines)

Python data science tools like pandas, NumPy, and scikit-learn are excellent. However, they use only one core out of the many cores in modern processors and are limited by your computer RAM. In this tutorial, you'll learn to scale your data science workflow to larger datasets+models using Dask, by leveraging the full potential of your laptop, all while staying in the PyData ecosystem. You will learn the fundamentals of parallel and distributed computing, when (and when not) to consider scaling, and work through some hands-on examples.

Brief Description and Contents to be covered

Dask is an open source library for parallel and distributed computing in Python. This tutorial is meant to be an introduction to this super broad and powerful library. We will:

Pre-requisites for the talk

Time required for the talk

1 hr

Link to slides

https://github.com/pavithraes/dask-mini-tutorial/blob/main/slides.pdf

Will you be doing hands-on demo as well?

Yes

Link to ipython notebook (if any)

https://github.com/pavithraes/dask-mini-tutorial

About yourself

My name is Pavithra Eswaramorthy. I currently work as a Community Engagement Manager at Coiled, where I help support Dask users and contributors. I also contribute to the Bokeh project and I've worked on administrating Wikimedia Foundation’s open source outreach programs in the past. In my spare time, I enjoy a good book and hot coffee. :)

Are you comfortable if the talk is recorded and uploaded to PyData Dellhi's YouTube channel?

Yes