uwescience / raco

Compilation and rule-based optimization framework for relational algebra. Raco is the language, optimization, and query translation layer for the Myria project.
Other
72 stars 19 forks source link

force ordering of partitioning attributes #565

Closed senderista closed 7 years ago

senderista commented 7 years ago

Fixes #515. Unfortunately I can't add a new RACO test to verify this fixes the bug, since FakeDB doesn't support physical representation properties (this is mentioned in https://github.com/uwescience/raco/issues/511), so I added a new integration test to MyriaX (which I'll merge as soon as this PR is merged): https://github.com/uwescience/myria/commit/b4f63165299a062081f5f730ba1becb6fde79e17

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 7abf971a7a79ef8fe5051a7f71d53755050b4f41 on partitioning_order into on master.

senderista commented 7 years ago

There's still an open optimization issue (although not a correctness issue). Reordering the join conditions will no longer produce an incorrect query plan, but it could force an unnecessary shuffle (which happens in the referenced integration test). When one side of a join is already partitioned on the join attributes, and the other input is not partitioned, the generated shuffle for the unpartitioned input should ensure its partitioning order is compatible with the already-partitioned input. If we allow the order of join conditions to determine the order of partitioning attributes (which we do now), then this will generally not be the case, and we will generate an unnecessary shuffle for the already-partitioned input. Since the ordering of partitioning attributes has no user-visible significance, there is no reason to let it be determined by the order of conditions in a query.

senderista commented 7 years ago

Opened https://github.com/uwescience/raco/issues/566 for the issue in my comment.

senderista commented 7 years ago

MyriaX integration test PR: https://github.com/uwescience/myria/pull/905