Quickstep
Quickstep is an experimental high-performance database engine designed with the
aim of Data at Bare-Metal Speed. It began life in 2011 as a
research project at the University of Wisconsin
and was acquired by Pivotal in 2015.
Getting Started (Building)
A build guide is available which includes instructions for
building Quickstep for the first time. You may also find it useful to use one
of the pre-made Vagrant boxes for Quickstep that are already
set up with all of the development tools needed to build Quickstep.
Documentation
All publicly-visible classes and functions in the Quickstep code base have
Doxygen documentation. Simply run doxygen
in the root of the Quickstep source
to generate browsable HTML documentation. Of course, the Doxygen comments
should also be useful when reading header files directly.
In addition to the Doxygen and inline code comments explaining implementation
details, a high-level overview for each module that comprises Quickstep is
included in the README files in each subdirectory.
Architectural Overview
Quickstep is composed of several different modules that handle different
concerns of a database system. The main modules are:
- Utility - Reusable general-purpose code that is used by many
other modules.
- Threading - Provides a cross-platform abstraction for threads
and synchronization primitives that abstracts the underlying OS threading
features.
- Types - The core type system used across all of Quickstep. Handles
details of how SQL types are stored, parsed, serialized & deserialized, and
converted. Also includes basic containers for typed values (tuples and
column-vectors) and low-level operations that apply to typed values (e.g.
basic arithmetic and comparisons).
- Catalog - Keeps track of database schema as well as physical
storage information for relations (e.g. which physical blocks store a
relation's data, and any physical partitioning and placement information).
- Storage - Handles the physical storage of relation data in
self-contained, self-describing blocks, both in-memory and on persistent
storage (disk or a distributed filesystem). Also includes some heavyweight
run-time data structures used in query processing (e.g. hash tables for join
and aggregation). Includes a buffer manager component for managing memory
use and a file manager component that handles data persistence.
- Compression - A simple implementation of ordered dictionary
compression. Several storage formats in the Storage module are capable of
storing compressed column data and evaluating some expressions directly on
compressed data without decompressing. The common code supporting compression
is in this module.
- Expressions - This module builds on the simple operations
provided by the Types module to support arbitrarily complex expressions over
data, including scalar expressions, predicates, and aggregate functions with
and without grouping.
- Relational Operators - This module provides the
building blocks for queries in Quickstep. A query is represented as a
directed acyclic graph of relational operators, each of which is responsible
for applying some relational-algebraic operation(s) to tranform its input.
Operators generate individual self-contained "work orders" that can be
executed independently. Most operators are parallelism-friendly and generate
one work-order per storage block of input.
- Query Execution - Handles the actual scheduling and
execution of work from a query at runtime. The central class is the Foreman,
an independent thread with a global view of the query plan and progress. The
Foreman dispatches work-orders to stateless Worker threads and monitors their
progress, and also coordinates streaming of partial results between producers
and consumers in a query plan DAG to maximize parallelism. This module also
includes the QueryContext class, which holds global shared state for an
individual query and is designed to support easy
serialization/deserialization for distributed execution.
- Parser - A simple SQL lexer and parser that parses SQL syntax into
an abstract syntax tree for consumption by the Query Optimizer.
- Query Optimizer - Takes the abstract syntax tree generated
by the parser and transforms it into a runable query-plan DAG for the Query
Execution module. The Query Optimizer is responsible for resolving references
to relations and attributes in the query, checking it for semantic
correctness, and applying optimizations (e.g. filter pushdown, column
pruning, join ordering) as part of the transformation process.
- Command-Line Interface - An interactive SQL shell interface to
Quickstep.
Licensing
Quickstep is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.