sunyinqi0508 / AQuery2

An in-memory column-store time-series database that uses query compilation
Mozilla Public License 2.0
10 stars 7 forks source link
columnstore database olap python time-series

AQuery++ Database

News:

Demo workflow for Triggers now available See DEMO

Introduction

AQuery++ Database is a cross-platform, In-Memory Column-Store Database that incorporates compiled query execution. (Note: If you encounter any problems, feel free to contact me via sunyinqi0508@gmail.com)

Architecture

Architecture

AQuery Compiler

Installation:

Requirements

  1. Recent version of Linux, Windows or MacOS, with recent C++ compiler that has C++17 (1z) support. (however c++20 is recommended if available for heterogeneous lookup on unordered containers)

    • GCC: 9.0 or above (g++ 7.x, 8.x fail to handle fold-expressions due to a compiler bug)
    • Clang: 5.0 or above (Recommended)
    • MSVC: 2019 or later (2022 or above is recommended)
  2. Monetdb for Hybrid Engine

    • On windows, the required libraries and headers are already included in the repo.
    • On Linux, see Monetdb Easy Setup for instructions.
    • On MacOS, Monetdb can be easily installed in homebrew brew install monetdb.
  3. Python 3.6 or above and install required packages in requirements.txt by python3 -m pip install -r requirements.txt

    Windows

    There're multiple options to run AQuery on Windows. But for better consistency I recommend using a simulated Linux environment such as Windows Subsystem for Linux (1 or 2), Docker or Linux Virtual Machines. You can also use the native toolchain from Microsoft Visual Studio or gcc from Winlabs/Cygwin/MinGW.

macOS

for arm64 macOS users

Linux

Docker (Recommended):

Singularity Container

  1. build container singularity build aquery.sif aquery.def
  2. execute container singularity exec aquery.sif sh
  3. run AQuery python3 ./prompt.py

    Usage

    python3 prompt.py will launch the interactive command prompt. The server binary will be automatically rebuilt and started.

    Commands:

    • <sql statement>: parse AQuery statement
    • f <filename>: parse all AQuery statements in file
    • exec: execute last parsed statement(s) with Hybrid Execution Engine. Hybrid Execution Engine decouples the query into two parts. The standard SQL (MonetDB dialect) part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.
    • stats <OPTIONAL: options> configure statistics.
    • no options: show statistics for all queries so far.
    • reset: resets statistics.
    • on : statistics will be shown for every future query.
    • off: statistics will not be shown for every future query.
    • script <filename>: use automated testing script, this will execute all commands in the script
    • sh <OPTIONAL: shell> launch a shell. Shell name can be specified (e.g. sh fish).
    • dbg start python interactive interpreter at the current context.
    • print: print parsed AQuery statements (AST in JSON form)
    • save <OPTIONAL: filename>: save current code snippet. will use random filename if not specified.
    • exit: quit the prompt
    • r: run the last generated code snippet

      Example:

      f moving_avg.a
      xexec

See files in ./tests/ for more examples.

Automated Testing Scripts

User Manual

AQuery++ has similar syntax to standard SQL with extensions for time-series analysis and user extensibility.

Basic Grammar

program : [query | create | insert | load | udf ]*

/********* Queries *********/
query : [WITH ID ['('columns')'] AS '(' single-query ')'] single-query

single-query : SELECT projections FROM datasource assumption where-clause groupby-clause

projections: [val as ID | val] (, [val as ID | val])*

datasource : ID [ID | AS ID] |
  ID, datasource |
  ID [INNER] JOIN datasource [USING columns | ON conditions] |
  ID NATURAL JOIN datasource

order-clause: ASSUMING ([ASC|DESC] ID)+

where-clause: WHERE conditions;

groupby-clause: GROUP BY expr (, expr )* [HAVING conditions]

conditions: <a boolean expression>

/********* Creating data *********/
create: CREATE TABLE ID [AS query | '(' schema ')']
schema: ID type (, ID type)*

insert: INSERT INTO ID [query | VALUES '(' literals ')']
literals: literal (, literal)*;

/********* Loading/Saving data *********/
load: LOAD DATA INFILE string INTO TABLE ID FIELDS TERMINATED BY string

save: query INTO OUTFILE string FIELDS TERMINATED BY string

/********* User defined functions *********/
udf: FUNCTION ID '(' arg-list ')' '{' fun-body '}'
arg_list: ID (, ID)*
fun_body: [stmts] expr

/********* Triggers **********/
create: CREATE TRIGGER ID [ ACTION ID INTERVAL num | ON ID ACTION ID WHEN ID ]
drop: DROP TRIGGER ID

stmts: stmt+ 
stmt: assignment; | if-stmt | for-stmt | ;
assignment: l_value := expr
l_value: ID | ID '[' ID ']'

if-stmt: if '(' expr ')' if-body [else (stmt|block) ]
if-body: stmt | block (elif '(' expr ')' if-body)*

for-stmt: for '(' assignment (, assignment)* ';' expr ';' assignment ')' for-body
for-body: stmt|block

block:  '{' [stmts] '}'

/********* Expressions *********/
expr: expr binop expr | fun_call | unaryop expr | ID | literal
fun: ID | sqrt | avg[s] | count | deltas | distinct 
  | first | last | max[s] | min[s] | next
  | prev | sum[s] | ratios | <... To be added> 
fun_call: fun '(' expr (, expr)* ')'
binop: +|-|=|*|+=|-=|*=|/=|!=|<|>|>=|<=| and | or
unaryop: +|-| not
literal:  numbers | strings | booleans

Data Types

Create Table

Tables can be created using CREATE TABLE statement. For example

CREATE TABLE my_table (c1 INT, c2 INT, c3 STRING)
INSERT INTO my_table VALUES(10, 20, "example")
INSERT INTO my_table SELECT * FROM my_table

You can also create tables using a query. For example:

CREATE TABLE my_table_derived
AS
  SELECT c1, c2 * 2 as twice_c2 FROM my_table

Drop Table:

Tables can be dropped using DROP TABLE statement. For example:

DROP TABLE my_table IF EXISTS

Load Data:

Combine Queries

Delete Data:

Performance Measurement

MonetDB Passthrough for Hybrid Engine

AQuery++ supports MonetDB passthrough for hybrid engine. Simply put standard SQL queries inside a \ \ block.

Each query inside an sql block must be separated by a semicolon. And they will be sent to MonetDB directly which means they should be written in MonetDB dialect instead of AQuery dialect. Please refer to the MonetDB documentation for more information.

For example:

CREATE TABLE my_table (c1 INT, c2 INT, c3 STRING)
INSERT INTO my_table VALUES(10, 20, "example"), (20, 30, "example2")
<sql>
INSERT INTO my_table VALUES(10, 20, "example3");
CREATE INDEX idx1 ON my_table(c1);
</sql>
SELECT * FROM my_table WHERE c1 > 10

Built-in functions:

Roadmap

Known Issues:

Credit: