This project implements a relational optimizer and executor in c#. It is called "experimental" because it does not have all the details carved. The main target is the optimizer, and the purpose is to prepare for a more serious production implementation later. The executor part is needed for plan correctness verification with TPCH/DS end to end runnable.
It is built on top of many database research results, and it has been open-sourced in the hope that others may find it useful and database community can provide feedback and ways to improve it.
Optimizer is logic centric, so a high-level language is preferred. After experiments, production may want to turn it into some C/C++ code, so the language must be a close relative of them. C# (.net core) provides some great features like cross-platform, LINQ, dynamic types to make modeling easy, and it is close enough to C++ (and that's why not python).
The optimizer exercises the following constructs:
The optimizer also exposes a DataSet like interface which demonstrates language integration. Another example calculating Pi with Monte-Carlo can be found in Unittest/TestDataSet.
// leverage c#'s sqrt so no reinventing wheels
double sqroot(double d) => Math.Sqrt(d);
SQLContext.Register<double, string>("sqroot", sqroot);
// use sqroot in SQL
var sql = "SELECT a1, sqroot(b1*a1+2) from a join b on b2=a2 where a1>1";
// above query in DataSet form
var a = sqlContext.Read("a");
var b = sqlContext.Read("b");
var rows = a.filter("a1>1").join(b, "b2=a2").select("a1", "sqroot(b1*a1+2)").show();
The executor is implemented for two purpose: (1) verify plan correctness; (2) test codeGen engineering readiness following paper "How to Architect a query compiler, Revisited" by R.Y.Tahboub et al. Executor is not intended to demonstrate implementing specific operator efficiently.
You can see an example generated code for this query here with generic expression evaluation is in this code snippet:
// projection on PhysicHashAgg112: Output: {a.a2}[0]*2,{count(a.a1)}[1],repeat('a',{a.a2}[0])
Row rproj = new Row(3);
rproj[0] = ((dynamic)r112[0] * (dynamic)2);
rproj[1] = r112[1];
rproj[2] = ExprSearch.Locate("74").Exec(context, r112) /*repeat('a',{a.a2}[0])*/;
r112 = rproj;
The program is based on .net core, so it runs on both Windows and Linux. Load the project with Visual Studio 2019 free community edition or visual studio code. Program.Main() mainly for debugging purpose. There are several builtin tables like 'a', 'b', 'c', 'd' for testing purpose, and you can find their definition in Catalog.createBuildInTestTables(). If you come up with a PR, make sure you have unittest passed.
Unittest comes up with multiple tests covering major functionalities: