pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
27.52k stars 1.68k forks source link

Stream processing support (in the style of flink/risingwave/arroyo/etc.) #17010

Open kszlim opened 1 week ago

kszlim commented 1 week ago

Description

Polars currently is the best dataframe experience for batch processing, it would be worth considering whether it'd be possible to support stream processing.

Some prior literature (for adding on streaming support) in this area exists within datafusion, it might be worth picking their brains: https://github.com/apache/datafusion/issues/9016 https://synnada.medium.com/running-windowing-queries-in-stream-processing-93068d3a5

This would be a killer feature for polars as you could now use one system to rule them all as opposed to having a bespoke/separate stream processing framework for real time analytics, unifying them would be great.

I'd imagine this could be taken into consideration during the construction of the new streaming engine.

Understandably this is a huge feature request and I totally understand if it's closed with a not planned.

kszlim commented 1 week ago

Sorry didn't notice that https://github.com/pola-rs/polars/issues/6839 was closed with out of scope, shall I close this, or is this something that might be in scope with the new streaming engine design? @ritchie46