zhouqingqing / qpmodel

A Relational Optimizer and Executor
MIT License
64 stars 18 forks source link
cascades-optimizer codegen distributed-query property-enforcer query-executor query-optimizer

Build Status Build Status CodeFactor Coverage Status

An Experimental Relational Optimizer and Executor

This project implements a relational optimizer and executor in c#. It is called "experimental" because it does not have all the details carved. The main target is the optimizer, and the purpose is to prepare for a more serious production implementation later. The executor part is needed for plan correctness verification with TPCH/DS end to end runnable.

It is built on top of many database research results, and it has been open-sourced in the hope that others may find it useful and database community can provide feedback and ways to improve it.

Why C

Optimizer is logic centric, so a high-level language is preferred. After experiments, production may want to turn it into some C/C++ code, so the language must be a close relative of them. C# (.net core) provides some great features like cross-platform, LINQ, dynamic types to make modeling easy, and it is close enough to C++ (and that's why not python).

Optimizer

The optimizer exercises the following constructs:

Executor

The executor is implemented for two purpose: (1) verify plan correctness; (2) test codeGen engineering readiness following paper "How to Architect a query compiler, Revisited" by R.Y.Tahboub et al. Executor is not intended to demonstrate implementing specific operator efficiently.

You can see an example generated code for this query here with generic expression evaluation is in this code snippet:

    // projection on PhysicHashAgg112: Output: {a.a2}[0]*2,{count(a.a1)}[1],repeat('a',{a.a2}[0]) 
    Row rproj = new Row(3);
    rproj[0] = ((dynamic)r112[0] * (dynamic)2);
    rproj[1] = r112[1];
    rproj[2] = ExprSearch.Locate("74").Exec(context, r112) /*repeat('a',{a.a2}[0])*/;
    r112 = rproj;

How to Play

The program is based on .net core, so it runs on both Windows and Linux. Load the project with Visual Studio 2019 free community edition or visual studio code. Program.Main() mainly for debugging purpose. There are several builtin tables like 'a', 'b', 'c', 'd' for testing purpose, and you can find their definition in Catalog.createBuildInTestTables(). If you come up with a PR, make sure you have unittest passed.

Unittest comes up with multiple tests covering major functionalities: