We need a rule a rule of pushing/fusing a subplan into a group by. Basically an
example query would be applying group-by with a filtering condition, like:
for $i in dataset("test")
group by $i.key with $i
return {
"gid": $i.key,
"avg": avg(for $j in $i where not(is-null($i.value)) return $j.value)
}
Currently the plan generated by asterix would be like:
...
assign: $$6 <- function-call: avg($$5)
subplan(
aggregate: $$5 <- listify($$4)
select: not(is-null($$4))
unnest:$$4 <- scan-collection($$2)
)
group-by[$$3](
aggregate: $$2<- listify($$1)
nested-source
)
...
Overall, the current plan creates the groups using group-by, and then applies
the filtering on each group in another subplan (which needs to unnest the group
once before filtering), then creates the group again (on the filtered records),
and finally applies the aggregation function on each group.
Actually this plan can be further optimized to avoid the unnest-listify, like:
group-by[$$3](
aggregate: $$6 <- avg($$1)
select: not(is-null($$1))
nested-source
)
To achieve this, we have to properly push (fuse) the subplan into the group-by.
Basically the following three conditions should be satisfied to trigger this
optimization:
- There is one group-by as producer (of the group), and a subplan as the
consumer (of the group)
- The groups produced by the producer (group-by) will be consumed by the
consumer (subplan)
- After being consumed by the consumer, the group will not be used by any
future operations in the plan (in our example, the result of first listify $$2
will be consumed and never being used again after the subplan)
(Initially assigned to Vinayak for coordinate)
Original issue reported on code.google.com by jarod...@gmail.com on 10 Jan 2014 at 6:11
Original issue reported on code.google.com by
jarod...@gmail.com
on 10 Jan 2014 at 6:11