A self-hostable CDN for databases. Spice provides a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets across databases, data warehouses, and data lakes.
[x] Data is streamed when accelerating from source into this accelerator
[ ] ~Data is streamed when reading/performing queries from this accelerator~
[ ] ~The accelerator supports primary keys and indexes~
[ ] ~The accelerator supports full federation within a single dataset (e.g. select * from my_dataset)~
[ ] ~The accelerator supports federation push down across multiple datasets within the same accelerator (e.g. select * from first_dataset, second_dataset)~
Indexes are not required for test coverage, but can be introduced if required for tests to pass (e.g. due to performance characteristics, etc).
When referring to accelerator access modes, "all supported modes" identifies every possible way to use that accelerator. For example, for DuckDB this would be file and memory modes. For PostgreSQL, this would only be the direct database access mode.
General
[ ] Integration tests to cover accelerating data from S3 parquet, MySQL, Postgres with the Core Arrow Data Types
[ ] ~Integration tests to cover "On Conflict" behaviors.~
TPC-H
[x] End-to-end test to cover accelerating TPC-H SF1 data from S3 and benchmarking TPC-H queries (official and simple).
[x] All supported modes should run all queries with no Major Bugs.
[x] A test script exists that can load TPC-H SF10 and TPC-H SF100 data into this accelerator in all supported modes.
[x] The accelerator can load TPC-H SF10 in all supported modes, and can run all queries with no Major Bugs.
[ ] ~The accelerator can load TPC-H SF100 in either file or direct database mode, and can run all queries with no Major Bugs.~
TPC-DS
[x] End-to-end test to cover accelerating TPC-DS SF1 data from S3 and benchmarking TPC-DS queries (official and simple).
[x] All supported modes should run all queries with no Major Bugs.
[x] A test script exists that can load TPC-DS SF10 and TPC-DS SF100 data into this accelerator in all supported modes.
[x] The accelerator can load TPC-DS SF10 in all supported modes, and can run all queries with no Major Bugs.
[ ] ~The accelerator can load TPC-DS SF100 in either file or direct database mode, and can run all queries with no Major Bugs.~
ClickBench
[x] A test script exists that can load ClickBench data into this accelerator in either file or direct database mode.
[x] The accelerator can load ClickBench in either file or direct database mode, and all queries are attempted.
[x] All query failures should be logged as issues. No bug fixes are required for ClickBench
Data correctness
[x] TPC-H SF10 loaded into memory, returned results are identical across source and accelerated queries for all TPC-H queries and TPC-H simple queries.
[ ] ~TPC-H SF100 loaded into file or direct database mode, returned results are identical across source and accelerated queries for all TPC-H queries and TPC-H simple queries.~
[x] TPC-DS SF10 loaded into memory, returned results are identical across source and accelerated queries for all TPC-DS queries and TPC-DS simple queries.
[ ] ~TPC-DS SF100 loaded into file or direct database mode, returned results are identical across source and accelerated queries for all TPC-DS queries and TPC-DS simple queries.~
Documentation
[x] Documentation includes all information and steps for a user to set up the accelerator.
[x] Documentation includes all known issues/limitations for the accelerator.
[x] Documentation includes any exceptions made to allow this accelerator to reach Beta quality (e.g. if a particular data type cannot be supported by the accelerator).
[ ] The accelerator has an easy to follow quickstart.
[ ] All Minor Bugs for TPC-DS and TPC-H are raised as issues.
Beta Release Criteria
Feature complete
select * from my_dataset
)~select * from first_dataset, second_dataset
)~Test Coverage
Beta quality accelerators should be able to run test packages derived from the following:
Indexes are not required for test coverage, but can be introduced if required for tests to pass (e.g. due to performance characteristics, etc).
When referring to accelerator access modes, "all supported modes" identifies every possible way to use that accelerator. For example, for DuckDB this would be file and memory modes. For PostgreSQL, this would only be the direct database access mode.
General
TPC-H
TPC-DS
ClickBench
Data correctness
Documentation