ibis-project / talks

3 stars 2 forks source link

PyData London 2024 - Kedro-Ibis Tutorial #25

Closed deepyaman closed 3 months ago

deepyaman commented 7 months ago

Proposal title

Analytics engineering without dbt? Building the composable Python data stack with Kedro and Ibis

Abstract

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. However, now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API, and we can leverage it to build scalable Python pipelines in Kedro. In this tutorial, we will develop a simple analytics pipeline locally, then deploy it in a cloud data warehouse, with just a configuration change.

Description

Python has become the lingua franca of data science, and it's a great language for building AI/ML pipelines. However, in the data engineering world, it leaves much to be desired. A lot of data practitioners end up:

In this session, we will first understand the motivation for a better solution for building production data pipelines in Python:

Then, we will implement a local solution using DuckDB and two popular open-source Python libraries:

Last but not least, we will discuss other benefits of this solution, including the reusability and portability of the Ibis-based data pipelines and validations. To that end—with one simple configuration change—we will run the same pipeline at scale in Starburst Galaxy.

Notes

We chose Starburst Galaxy for the tutorial only because it is easy to create a free trial account and get started using it (for the purpose of demonstrating support for multiple backends and remote execution). Another platform that offers a free trial, like Google BigQuery, would be an equally-good option.

We have also recently published a blog post articulating how Kedro and Ibis can be used together.

Last but not least, only Deepyaman's name is included in the YouTube title, because "Deepyaman Datta, Juan Luis Cano Rodríguez, and Joel Schwarzmann" would be 63 characters alone.

ncclementi commented 3 months ago

Closing since this was delivered