lakehq / sail

LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
https://lakesail.com
Apache License 2.0
478 stars 14 forks source link
arrow big-data data datafusion pyspark python rust spark sql

Sail

Build Status PyPI Release PyPI Downloads

The mission of Sail is to unify stream processing, batch processing, and compute-intensive (AI) workloads. Currently, Sail features a drop-in replacement for Spark SQL and the Spark DataFrame API in both single-host and distributed settings.

Kubernetes Deployment

Please refer to the Kubernetes Deployment Guide for instructions on deploying Sail on Kubernetes.

Installation

Sail is available as a Python package on PyPI. You can install it using pip.

pip install "pysail==0.2.0.dev0"

Alternatively, you can install Sail from source for better performance for your hardware architecture. You would need rustup and protoc in your environment for this.

env RUSTFLAGS="-C target-cpu=native" pip install "pysail==0.2.0.dev0" -v --no-binary pysail

You can follow the Getting Started guide to learn more about Sail.

Documentation

The documentation of the latest Sail version can be found here.

Benchmark Results

Check out our blog post, Supercharge Spark: Quadruple Speed, Cut Costs by 94%, for detailed benchmark results comparing Sail with Spark.

Contributing

Contributions are more than welcome!

Please submit GitHub issues for bug reports and feature requests.

Feel free to create a pull request if you would like to make a code change. You can refer to the development guide to get started.

Support

See the Support Options Page for more information.