Closed ecstatic-morse closed 3 years ago
Needless to say this looks good to me :)
I added a similar optimization for filter_with
and filter_anti
. The fact that these use (Key, Val)
but never handle Key
and Val
separately threw me initially (maybe you need this for inference on the Val
parameter?), but it's correct to cache both.
Maintaining a cache is profitable because the "source" of the leapjoin yields tuples in sorted order. Therefore, if the
Key
of the leaper is near the start of theSourceTuple
(ideally a prefix), we would expect to get many tuples with the sameKey
in a row (assuming multiple tuples with thatKey
exist).On the
clap-rs
benchmark with the naive ruleset (basically asubset
stress test), this optimization saves about 10 seconds per leaper (ExtendWith
andExtendAnti
) for a 20% speedup overall.I'm kind of embarrassed to have missed this in my first look at
datafrog
. I saw that we were cachingstart
andend
across thecount
->propose
->intersect
sequence of calls, but didn't see that we could be doing more. :disappointed: