rust-lang / datafrog

A lightweight Datalog engine in Rust
Apache License 2.0
796 stars 43 forks source link

Cache results of previous `Key` search in `Leaper`s #35

Closed ecstatic-morse closed 3 years ago

ecstatic-morse commented 3 years ago

Maintaining a cache is profitable because the "source" of the leapjoin yields tuples in sorted order. Therefore, if the Key of the leaper is near the start of the SourceTuple (ideally a prefix), we would expect to get many tuples with the same Key in a row (assuming multiple tuples with that Key exist).

On the clap-rs benchmark with the naive ruleset (basically a subset stress test), this optimization saves about 10 seconds per leaper (ExtendWith and ExtendAnti) for a 20% speedup overall.

I'm kind of embarrassed to have missed this in my first look at datafrog. I saw that we were caching start and end across the count -> propose -> intersect sequence of calls, but didn't see that we could be doing more. :disappointed:

lqd commented 3 years ago

Needless to say this looks good to me :)

ecstatic-morse commented 3 years ago

I added a similar optimization for filter_with and filter_anti. The fact that these use (Key, Val) but never handle Key and Val separately threw me initially (maybe you need this for inference on the Val parameter?), but it's correct to cache both.