rust-lang / datafrog

A lightweight Datalog engine in Rust
Apache License 2.0
796 stars 43 forks source link

Support arbitrary tuples #34

Closed regexident closed 3 years ago

regexident commented 3 years ago

This PR aims to make datafrog less dependent on(Key, Value) tuples, lessening the burden of having to create lots and lots of intermediary variables that do little more than move an element to the front of the tuple, so it can be used as key.

Before

As such where you currently have to introduce auxiliary variables of (Key, Value) tuples:

let variable: Variable<(f32, i32, bool)> = …;
let relation: Relation<(u8, i32)> = …;

// Auxiliary variables/relations:
let auxiliary_variable: Variable<(i32, (f32, bool))> = …;
let auxiliary_relation: Relation<(i32, u8)> = …;

let result_variable = iteration.variable::<(u8, bool)>("(u8, bool)");

while iteration.changed() {
    // Additional maintenance for auxiliary variables:
    auxiliary_variable.from_map(&variable, |tuple|
        (tuple.1, tuple.0)
    );
    auxiliary_relation.from_map(&relation, |tuple|
        (tuple.1, tuple.0)
    );

    result_variable.from_join(
        &auxiliary_variable,
        &auxiliary_relation,
        |_, (_, boolean1), (_, byte2)| (byte2, boolean1),
    );
}

After

… with this PR you simply extract the key from an arbitrary tuple index using a closure:

let variable: Variable<(f32, i32, bool)> = …;
let relation: Relation<(u8, i32)> = …;

let result_variable = iteration.variable::<(u8, bool)>("(u8, bool)");

while iteration.changed() {
    result_variable.from_join_by(
        &variable,
        &relation,
        |(_, integer, _)| integer,
        |(_, integer)| integer,
        |(_, _, boolean1), (byte2, _)| (byte2, boolean1),
    );
}

This cuts down the maintenance burden of having to keep several additional auxiliary variables and relations up-to-date, reducing the mental overhead and brings your datafrog Rust code a little bit closer to their corresponding datalog rules.

@frankmcsherry @nikomatsakis I hope that these changes are not in conflict with the soundness of the API of datafrog or the semantics of Datalog?

Performance

I did some performance testing with criterion, but did not find any consistent and noteworthy performance regressions for existing APIs and found the overhead of the …_by method variants minimal.

Known limitations

Unfortunately this PR only allows lenses to return references to arbitrary elements of the tuple (via Accessor1: Fn(&Tuple1) -> &Key), but no owned ad-hoc values.

I had initially hoped to be able to allow for arbitrary Key values, such as ad-hoc tuples or other computed values. This would have allowed for a returning (&X, &Y) as key for (X, Y, Z) tuple, rather than just single individual elements of it. Support for arbitrary tuples as keys would have been very convenient for scenarios with complex n-ary keys such as borrow_check.rs.

Changing Accessor1: Fn(&Tuple1) -> &Key, to Accessor1: Fn(&Tuple1) -> Key, unfortunately causes accessors that return a reference to the tuple or elements of it, such as |tuple| &tuple or |tuple| &tuple.0 to trigger this language limitation. Unfortunately those are the most commonly used accessors and I'm not aware of a workaround that would allow both, arbitrary keys and borrowed tuple element keys.

Minimal reproducible snippet ```rust fn dummy<'a, T, K: 'a, F>(value: &'a T, f: F) -> K where F: Fn(&T) -> K { f(value) } fn main() { let tuple = (1, 2, 3); let _ = dummy(&tuple, |tuple| &tuple.0); } ``` Error: ``` 10 | let _ = dummy(&tuple, |tuple| &tuple.0); | ------ ^^^^^^^^ returning this value requires that `'1` must outlive `'2` | | | | | return type of closure is &'2 i32 | has type `&'1 (i32, i32, i32)` ```

This is part #3 of a stacked pull-request and adds upon #33:

└── Split-up\ lib.rs (#32)
    └── Cleanup\ generics (#33)
        └── Support arbitrary tuples 👈🏻