rust-lang / datafrog

A lightweight Datalog engine in Rust
Apache License 2.0
796 stars 43 forks source link

Investigate the effect of "treefrog leapjoin"-ing more than one Variable #6

Closed lqd closed 6 years ago

lqd commented 6 years ago

Right now, the TFLJ algorithm reduces the need for temporary variables, when there is one dynamic data source (one Variable) and "leapers" over Relations only.

In some situations, this can mean a temporary relation is still required if one needs to join more than one variable. It would be interesting to investigate how/if being able to remove such temporaries would help the polonius benchmarks.

This might open the possibility to "inline" intermediate relations themselves (and not just the "steps" or indices) completely, for instance the big subset and requires relations, and whose tuples might not all need be produced in the first place.

Some details on how to do this are in this comment by Frank.

(Note: this might need to wait for TFLJ to land on master, polonius using it, and having more diverse benchmarks, checking with the rustc-perf benchmarks as well, or also having datafrog-specific benchmarks)

frankmcsherry commented 6 years ago

A quick comment here: the reason the current implementation uses only one variable (and as many relations as you like) is that the logic only correctly implements "updates resulting from updates to the source variable". Changes to the source variable drive the determination of output changes.

Responding to changes to multiple variables will almost certain require as many calls to the current leapjoin_into as there are variables involved. They will probably also require differently indexed inputs (because the variables probably bind different values, which means that other participating variables and relations will need to respond to different subsets of values).

Not fundamentally hard to do, but might be a bit of a head-scratcher about how to get it to look good, and whether the overhead (e.g. multiple indices for participating variables) actually improves over using a temporary (I'm not sure either way).

lqd commented 6 years ago

Some missing context: this issue was created as a possible avenue of exploration for the people who wanted to work on NLLs at the last "impl Days" for RustFest Paris. As Frank mentioned before, it is uncertain it would actually improve things, and there are surely more actionable things to try before even considering this; so I think we can close this issue for now as it doesn't represent a problem in 🐸 per se.