vmware / database-stream-processor-compiler

Infrastructure to run programs written in high-level languages on top of the Database Stream Processor (DBSP) runtime.
Other
14 stars 2 forks source link

[RFC] Global relations, input/output relations #12

Open ryzhyk opened 2 years ago

ryzhyk commented 2 years ago

We propose to allow let-bindings in the global name space. In particular, this will allow writing DDlog-1 style rules whose inputs and outputs have global visibility:

let R1: Relation<T1>  = ...;
let R2: Relation<T2> = ...;
let R3: Relation<T3>  = R1.filter(...).join(R2, ...);

Note that such declarations can be considered syntactic sugar for 0-arity functions.

Such relations can also be accessed from a relation transformer without explicitly passing them as an argument to the transformer:

let R1: Relation<T1>  = ...;
let R2: Relation<T2> = ...;

fn rule1() -> Relation<T3> {
    R1.filter(...).join(R2, ...)
}
fn rule2() -> Relation<T3> {
    R1.antijoin(R2, ...)
}

// A relation with two rules.
let R3: Relation<T3>  = rule1().concat(rule2);

Global mutually recursive relations must be declared as a tuple:

let (R1, R2): (Relation<T1, T2>) = fixed_point(|r1, r2| ...);

Finally, we can extend this feature to define input and output relations as a special kind of global variables, which seems more ergonomic than passing them as inputs and outputs to the main function:

// An input relation cannot be assigned to.
input R1: Relation<T1>;
output R2: Relation<T2> = .....;
mihaibudiu commented 2 years ago

Let's not use a word to mean something different. Something 'const' is supposed to never change. Something you can't assign to should be called something else. And equating const with global is not a good idea either. You may well want const local values. In C++ const can be a modifier for a variable or argument. You cannot modify the variable or argument, but you can assign to them non-const values. But this is important only for references.

ryzhyk commented 2 years ago

Let's not use a word to mean something different. Something 'const' is supposed to never change. Something you can't assign to should be called something else. And equating const with global is not a good idea either. You may well want const local values. In C++ const can be a modifier for a variable or argument. You cannot modify the variable or argument, but you can assign to them non-const values. But this is important only for references.

Sorry, I don't understand what the issue is. I think my definition of const is inline with Rust, which is part of our design goal. I'm not equating consts with globals, and we can totally support const locals as well. But consts enable global (i.e., DDlog-1 style) relations.

mihaibudiu commented 2 years ago

I think of const relations as the ones that contain just facts. In DDlog-1 all relations are global.

ryzhyk commented 2 years ago

Ok, I see now how const can be confusing. There are two different evaluation times in ddlog-2. I was thinking of a const relation as one whose dataflow graph is constructed statically (at compile time). But it makes sense that it can also be interpreted as relation whose contents never changes.

ryzhyk commented 2 years ago

I suggest replacing const with let throughout the RFC, e.g.:

let (R1, R2): (Relation<T1, T2>) = fixed_point(|r1, r2| ...);

The const part is implied since by default all variables are immutable.

mihaibudiu commented 2 years ago

I am not sure this syntax is good enough. I think we should write some programs using it and we'll figure out whether it is sufficient. In particular, I am not sure how data will flow in and out of loops. You may want to push the contents of some relations into loops (\delta_0), and to read out the contents of other relations from the loops, which are not part of the fixed-point themselves.

ryzhyk commented 2 years ago

I am not sure this syntax is good enough. I think we should write some programs using it and we'll figure out whether it is sufficient.

This is true for all aspects of language design. I am working on a DDlog-1 to DDlog-2 converter, which should provide partial validation for the new design. This is what actually prompted this RFC.

In particular, I am not sure how data will flow in and out of loops. You may want to push the contents of some relations into loops (\delta_0), and to read out the contents of other relations from the loops, which are not part of the fixed-point themselves.

That should "just work".

Kixiron commented 2 years ago

I'm not sure how I feel about reusing let like that, maybe static or relation would work?

ryzhyk commented 2 years ago

I think this use of let isn't really different from Rust, other than the fact that we use it in the global scope. static doesn't seem any clearer. relation seems ok, except relation R: Relation<T> repeats "relation" twice. I guess we don't need to settle on a syntax right now, as long as we agree that this is a useful facility.

Kixiron commented 2 years ago

Well, if we're limiting relation ?ident to only be things of the Relation type we could just implicitly wrap it, desugaring relation R: i32 = ?expr; into relation R: Relation<i32> = ?expr;. Syntax bikeshed aside, I totally agree that this is something rather required.

As something else to think about, for recursive relations we could take a book from functional languages like F# and require that recursive relations are annotated explicitly with let rec or something along those lines

ryzhyk commented 2 years ago

The only way to define recursive relations in this language (we don't have the Datalog syntactic sugar yet) is using the fixed_point operator, so not only they are explicitly recursive, but also all mutually recursive relations are declared at the same time:

let (rel1, rel2) = fixed_point(....);
ryzhyk commented 2 years ago

relation R: i32 = ?expr;

I'd rather relation R<i32> = ...;

Kixiron commented 2 years ago

Hum, thinking about it I think that slight redundancy with requiring Relation is fine since it also allows declaring non-relation constants within patterns as well as having relation collections, allowing this

let allowed_names: Map<Str, Relation<Str>> = ...;
allowed_names["something"].filter(...);
let interesting_numbers: Vec<Relation<usize>> = {
    let mut numbers = Vec::new();
    numbers.push(primes());
    numbers.push(collatz());
    numbers
};
let (partitioned_things, partitions) = {
    let n = something();
    (things.partition(n), n)
}
ryzhyk commented 2 years ago

Now you're back to using let :) Yes, I agree, these are all good examples of things we may want to do in the global scope.

One issue is that there are different kinds of relations (input/output/internal). It's tempting to use different types for them InputRelation<T>, OutputRelation<T>, InternalRelation<T>. Then Relation should become a trait, which all of these implement. Every function that manipulates relations ends up with complex trait bounds like:

fn <R1, T1, R2, T2> transformer(r1: R1)  -> R2
where R1:Relation<T1>,
           R2:Relation<T2>

which is probably too much for most users, so I prefer to have a concrete Relation<T> type (and possibly add Stream<T> later) and use a different mechanism to distinguish input/output relations.

In addition, input and output relations should have externally visible names or ids, so

let interesting_numbers: Vec<Relation<usize>>

is perfectly fine for internal relations, but not for input and output relations.

ryzhyk commented 2 years ago

Actually, inputs and outputs don't have to be part of the language.

let rel1: Relation<T1>;
let rel2: Relation<T2>;

fn main() {
    // Instantiate a dataflow with input relation `rel1` and output relation `rel2`
    let dataflow: Dataflow = Dataflow::new(["input" -> rel1], ["output" -> rel2]);
}

This paves the way for more dynamic APIs that allow modifying the dataflow at runtime, join multiple dataflows, etc.