uwescience / raco

Compilation and rule-based optimization framework for relational algebra. Raco is the language, optimization, and query translation layer for the Myria project.
Other
72 stars 19 forks source link

Order by/ Sort #174

Open domoritz opened 10 years ago

domoritz commented 10 years ago

MyriaL should support sorting. We have support for sorted scans and an in memory order by. Later, we probably want a sort merge join.

dhalperi commented 10 years ago

(sort-merge join is not a language feature, but rather a rule in our eventual optimizer.)

domoritz commented 10 years ago

But somebody has to write the optimizer rule for that. This issue is for language support and optimizer support.

7andrew7 commented 10 years ago

It isn't clear whether this refers to ORDER BY (a language feature) or sort-merge join (an optimizer feature). These should be two separate issues.

domoritz commented 10 years ago

ORDER BY because we don't have a merge consumer in myria yet.

senderista commented 9 years ago

Distributed merge can lead to deadlock for multiple producers/multiple consumers: see discussion in http://dl.acm.org/citation.cfm?id=152611. There are various workarounds discussed in that reference.

senderista commented 8 years ago

We already have a global Merge consumer that can run on the coordinator, along with local MergeJoin and InMemoryOrderBy operators. We would need to push sorts down to Postgres for data that didn't fit in memory, but we may already have what we need to support ORDER BY explicitly in MyriaL. The Raco and Myria algebras already support ORDER BY (there is a rewrite rule to convert the Raco OrderBy logical operator to the Myria MyriaInMemoryOrderBy physical operator), but there is no syntax support in the MyriaL parser. We should prioritize implementing ORDER BY in MyriaL based on user needs and limitations of the supporting operators.

senderista commented 8 years ago

We also need to implement a rewrite rule to decompose a global OrderBy logical operator into a global Merge of local OrderBy operators, somewhat akin to the DecomposeGroupBy rule (do we also need a new Merge logical operator?). We should also have a rewrite rule for ORDER BY + LIMIT, so the global Merge only fetches the top k results from each local OrderBy (the existing CollectBeforeLimit rule doesn't seem to preserve order in the final output).

domoritz commented 8 years ago

I totally don't remember writing https://github.com/uwescience/myria/commit/555907f3ac73e0fed6a08d29b3c36cd9677b7114. Anyway, note that this is not a merge consumer! So it won't merge the inputs from children but only the inputs from different operators.

senderista commented 8 years ago

Right, so you're saying we need a proper exchange operator for Merge? I'll open an issue for that, thanks!

senderista commented 8 years ago

@bmyerz would appreciate feedback on adding a Merge operator to the logical algebra (possibly unnecessary since it only exists to physically implement global logical OrderBy).

billhowe commented 8 years ago

I'd prefer not to see a merge operator at the logical level.

Is the use case here top k? That's a reasonable logical operator.

In fact, I'd go further and claim logical order by is always unnecessary except at the root, just like in SQL, except for plans involving stateful apply.

(SQL gets around that by using a separate order by clause specifically for window functions.)

On Monday, May 9, 2016, Tobin Baker notifications@github.com wrote:

@bmyerz https://github.com/bmyerz would appreciate feedback on adding a Merge operator to the logical algebra (possibly unnecessary since it only exists to physically implement global logical OrderBy).

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/174#issuecomment-218067109

bmyerz commented 8 years ago

@senderista I agree with Bill on this

senderista commented 8 years ago

@billhowe @bmyerz OK, sounds like we just need a new Myria Merge physical operator (corresponding to the unimplemented Merge exchange operator) targeted by a rewrite rule which decomposes logical OrderBy into Merge + physical (local) OrderBy?

Also, can we go ahead and push logical OrderBy into SQL even in the absence of ORDER BY support in MyriaL? That would allow us to use Postgres indexes to optimize local sorting (and may be the only feasible alternative when local data is too large for the InMemoryOrderBy operator). Since this will pretty much be a requirement for efficiently supporting ORDER BY in MyriaL, why not get it implemented and tested first?

bmyerz commented 8 years ago

@senderista Ya, we can do the local OrderBy (and insert a Merge so that the plan is consistent with logical OrderBy).

senderista commented 8 years ago

Something @mbalazin pointed out is that for top-k queries with small k, we can just run local top-k queries at each worker (possibly pushing them into Postgres), and sort the combined results in memory at the coordinator using the existing InMemoryOrderBy operator. That should be a pretty simple rewrite rule in Raco, and would eliminate the dependency on a global Merge operator for the top-k scenario.