pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.01k stars 1.94k forks source link

Implement the Arrow C stream interface for DataFrame #14208

Open eitsupi opened 8 months ago

eitsupi commented 8 months ago

Description

r-polars has the ability to export DataFrames via the Arrow C stream interface (pola-rs/r-polars#5, by @paleolimbot). https://arrow.apache.org/docs/format/CStreamInterface.html

Since this is not related to R, I believe that porting it through the polars crate will make it easier for all downstream projects to exchange DataFrames with other Arrow implementations. Conversely, it would be useful to be able to create DataFrames via the Arrow C stream interface.

For example, arrow-rs seems to exchange stream data with pyarrow (i.e., C++ libarrow) as follows: https://github.com/apache/arrow-rs/blob/121666e464170d7dce41bfd61de001a19affde72/arrow/src/pyarrow.rs#L387-L448

ritchie46 commented 8 months ago

Yes, can you port that work to the polars-arrow? It can iterator over a Vec<arrow::Chunk> in that case.

eitsupi commented 8 months ago

Yes, can you port that work to the polars-arrow? It can iterator over a Vec<arrow::Chunk> in that case.

Sorry, but I may not have enough knowledge and time. I think polars-arrow does not have DataFrame, but if we define an iterator in polars-arrow, does that mean we can also realize iterator for DataFrame?

etiennebacher commented 3 months ago

I think this is fixed by #17696?

eitsupi commented 3 months ago

I think this is fixed by #17696?

No, that is not C Stream interface, C interface.