h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
321 stars 85 forks source link

Add Rust's Polars as new solution. #163

Closed ritchie46 closed 3 years ago

ritchie46 commented 3 years ago

Polars is an in-memory DataFrame library in Rust, that uses apache arrow as backend. It is also available via Python.

There are some benchmarks done against pandas in the readme and from Python in this notebook.

jangorecki commented 3 years ago

Hi, thanks for suggestion. Please be aware there is already similar suggestion filled in #107 asking for Rusk DataFussion, which also uses arrow. Does Polars outsource all queries to arrow, or some algorithms are implemented in Polars? There might be not much sense to add multiple libraries which all will use same "engine" for execution queries.

ritchie46 commented 3 years ago

Arrow is only the backend (like numpy for pandas). All DataFrame/ Query logic (join, groupby, filters, pivots, etc) is implemented by Polars.

DataFusion is indeed also based on Apache Arrow, but is a totally different DataFrame approach.