Utilization of Arrow/Rust Datafusion

rajasekarv / vega

A new arguably faster implementation of Apache Spark from scratch in Rust

Apache License 2.0

2.23k stars 206 forks source link

Utilization of Arrow/Rust Datafusion #79

Closed Hoeze closed 4 years ago

Hoeze commented 4 years ago

Hi, I just read about Datafusion: https://github.com/apache/arrow/tree/master/rust/datafusion

Would the SQL query planning, etc. be helpful for native_spark?

rajasekarv commented 4 years ago

Of course. SQL is the next big step and I am deciding on the internal data structure of Dataframes. It is most likely going to be an array of arrays. Public API should be the same as Spark or at least should be as close as possible. I have decided to not go with Arrow for base types. However, it will act as intermediate when interacting with other languages like Python, Java, etc.,