sxt-partners / arrow-ballista

Apache Arrow Ballista Distributed Query Engine
https://arrow.apache.org/datafusion
Apache License 2.0
1 stars 2 forks source link

Ballista: Distributed SQL Query Engine, built on Apache Arrow

Ballista is a distributed SQL query engine powered by the Rust implementation of Apache Arrow and DataFusion.

If you are looking for documentation for a released version of Ballista, please refer to the Ballista User Guide.

Overview

Ballista implements a similar design to Apache Spark (particularly Spark SQL), but there are some key differences:

Features

Performance

We run some simple benchmarks comparing Ballista with Apache Spark to track progress with performance optimizations. These are benchmarks derived from TPC-H and not official TPC-H benchmarks. These results are from running individual queries at scale factor 10 (10 GB) on a single node with a single executor and 24 concurrent tasks.

The tracking issue for improving these results is #339.

benchmarks

Getting Started

The easiest way to get started is to run one of the standalone or distributed examples. After that, refer to the Getting Started Guide.

Project Status

Ballista supports a wide range of SQL, including CTEs, Joins, and Subqueries and can execute complex queries at scale.

Refer to the DataFusion SQL Reference for more information on supported SQL.

Ballista is maturing quickly and is now working towards being production ready. See the following roadmap for more details.

Roadmap

There is an excellent discussion in https://github.com/apache/arrow-ballista/issues/30 about the future of the project, and we encourage you to participate and add your feedback there if you are interested in using or contributing to Ballista.

The current focus is on the following items:

Architecture Overview

There are currently no up-to-date architecture documents available. You can get a general overview of the architecture by watching the Ballista: Distributed Compute with Rust and Apache Arrow talk from the New York Open Statistical Programming Meetup (Feb 2021).

Contribution Guide

Please see the Contribution Guide for information about contributing to Ballista.