vmware / database-stream-processor-compiler

Infrastructure to run programs written in high-level languages on top of the Database Stream Processor (DBSP) runtime.
Other
14 stars 2 forks source link

[RFC] AST interpreter. #9

Open ryzhyk opened 2 years ago

ryzhyk commented 2 years ago

On the surface, DDlog-2 is a purely functional language. The main function of a program transforms a set of input relations into a set of output relations. However, this transformation is just the first phase of program execution that runs as part of the compilation process. During the second, runtime, phase the user feeds data to the input relations and receives data from the output relations.

As part of the first phase, we must evaluate the program and generate a dataflow graph. This evaluation is performed by the AST interpreter.

Example

Consider the following program that takes relation r1 and returns a vector of 10 relations that multiply the contents of r1 by 1, 2, 3, ... 10.

fn main(r1: Relation<usize>) -> Vec<Relation<usize>> {
    let mut outputs = Vec::new();

    for i in 1..11 {
        outputs.push(f(r1, i))
    }
    outputs
}

fn f<T: Mul<usize>>(r: Relation<T>, multiplier: usize) -> Relation<T> {
    r.map(|x| x*multiplier)
}

The AST interpreter evaluates this program to produce the following dataflow graph:

                           ┌────────┐
             ┌────────────►│ Map(x1)├────────────────►
             │             │        │
             │             └────────┘
             │
             │
             │             ┌────────┐
    r1       │             │Map(x2) │
─────────────┼─────────────►        ├────────────────►
             │             └────────┘
             │
             │
             │               ...
             │
             │             ┌────────┐
             └────────────►│Map(x10)│
                           │        ├─────────────────►
                           └────────┘

The AST interpreter runs after parsing, validation, and type inference phases. Its output is not yet the dataflow graph used by the optimizer (RFC #8). It consists of high-level dataflow operators (i.e., no arrange, consolidate, etc.) and still uses the AST format for expressions inside each dataflow operator rather than RVSDG or whatever other format we choose for the IR.

mihaibudiu commented 2 years ago

This should be fairly straightforward to build; the complexity depends on the complexity of the circuit generator language, and mostly on the types supported for compile-time circuit generation.

ryzhyk commented 2 years ago

Yes, this is meant to be straightforward, especially since it also does not need to be fast.