ibis-project / talks

3 stars 2 forks source link

Flink Forward 2024 - Pipelines Done Right #35

Open deepyaman opened 1 month ago

deepyaman commented 1 month ago

Title

One pipeline to rule them all: unified end-to-end execution with multi-engine Python data pipelines

Description

Kedro is an open-source Python framework to create reproducible, maintainable, and modular data analytics pipelines. Originally developed at QuantumBlack, AI by McKinsey to bring software engineering best practices to data science code, it joined the LF AI & Data Foundation in 2021.

As the Kedro community has grown, so has the scale of Kedro deployments. While projects often begin with local exploration, experimentation, and execution, companies using Kedro in the production process data at terabyte scale (and higher). Recently, the rise of the multi-engine data stack has led users to leverage Kedro's native integration with Ibis, a Python dataframe API that executes on 20+ query engines, to run their workflows on the most appropriate infrastructure(s) at any given time.

The Ibis integration further enables users to run Kedro pipelines across batch and streaming contexts. Earlier this year, Ibis added support for streaming concepts and backends, including Apache Flink and RisingWave. Moving from dev to prod, from batch to streaming, requires little more than a configuration change.

Last but not least, the recently-released, open-source IbisML library supercharges feature engineering and last-mile data preprocessing pipelines to leverage any Ibis-supported backend.

Join us to learn more about how you can use Kedro and Ibis together to build better, unified, portable end-to-end data and analytics pipelines!

Additional notes

This submission was inspired by the topic for consideration, "Pipelines Done Right - Streaming ETL, Batch Pipelines, or something in between!"

Related work:

zhenzhongxu commented 1 month ago

@deepyaman Great intro to both Kedro and Ibis projects. I personally think adding a resonating problem statement and a summary of what the audience might expect to learn in the beginning can make the proposal more appealing.