Closed dollschasingmen closed 9 years ago
main use case:
(def my-query [outvars] (<- outvars (my-gen !a !b) (pred1 !a !b :> c) (pred2 !a b :> !d))
then suppose you call it like this:
(my-query ["!a])
without the pruning code, both pred1 and pred2 would be called. with the pruning code only pred1 would be called. so you can use it to dynamically only request the functions that you need to run in the query. should be more efficient this way (since you just skip calling code that where the output is ignored). in practice, this is best if pred2, for example, is really slow/heavy.
so tl:dr, it only runs code that is actually needed.
assumptions - in theory none :) the main approach of this is to check if the output of a particular predicate is necessary -- aka, is the output var being used somewhere else.
there are a few exceptions to this -- which the test document.
@dollschasingmen, sorry I never really looked at this. I know it's been a long, long time.
One thing we need to guarantee is that we include filtering variables in the set of variables that we protect, not just the final projection. a query like
(<- [?x]
(src ?x)
(* ?x ?x :> ?square)
(even? ?square))
Needs to return only even squares even though the filter variable's removed. Does that make sense? Let me know if you're still interested here and I can give you a super-fast review turnaround time. Thanks again!
@sritchie, np, thanks for not forgetting this PR :)
This function doesn't prune filter predicates under any circumstances. The pruning action is limited to only non-filter and non-generator predicates. I added a few more tests to assert that (see latest commits).
Sorry, the function is a bit convoluted. It has been a long time and reading through it was not super simple even for me, ha. Not sure how to make it the logic more obvious/readable (suggestions very much appreciated here), but I re-wrote the function and test descriptions to make it all a bit more clear (hopefully?)
This looks great. Currently compilation's forced at the subquery boundary, but I can see this advancing down the road to actually filter down across subqueries when the user does some projection like (src ?x _)
when using a query.
I'm happy to merge this now, but I do have one more Q / suggestion; I might be wrong here, but reading the code it looks like this optimization might not be able to handle chains of operations that end in some unused variable, like
(<- [?x]
(src ?x)
(* ?x ?x :> ?square)
(dec ?square :> ?square-minus-one))
I bet the easiest way to deal with this would be to run the same optimization you're currently running, but to pass it through a fixed-point
function like
(defn fixed-point [f guess]
(let [next (f guess)]
(if (= guess next)
next
(fixed-point f next))))
and cull all operations backwards. Up to you if you think this is a good idea or worth doing in this pull request. I always feel lame asking for MORE work in a pull request, especially given how badly I dropped the ball on reviewing this one.
Thanks again!
This is pretty closely related to
https://github.com/nathanmarz/cascalog/issues/24
Both the chained-and-ignored predicate thing and duplicate operations happen a lot as a result of predicate macros.
yea - good point re: handling chains of operations. i think i got it to work using basically that fixed-point idea, see latest commit.
also agree about the subquery boundary and pred-macros -- sure would be nice to be able to prune into that as well :) but that I might prefer to leave to another PR.
hm not sure why CI failed, test passed locally
Looks like a dependency download failure with cascalog-math. Thanks again!
Sweet!
Love that this works now:
cascalog.api> (defn throw! [& xs] (throw (RuntimeException. "gotcha!")))
#'cascalog.api/throw!
cascalog.api> (??<- [?x] ([[1]] ?x) (throw! ?x :> ?ignored))
([1])
cascalog.api> (??<- [?x ?ignored] ([[1]] ?x) (throw! ?x :> ?ignored))
FlowException local step failed cascading.flow.planner.FlowStepJob.blockOnJob (FlowStepJob.java:219)
@dollschasingmen A couple questions to help me understand please. What's the gain in doing this? Is there any assumption being made in the pruning?