prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.07k stars 5.38k forks source link

Add partition attributes to sort node #24095

Open feilong-liu opened 4 days ago

feilong-liu commented 4 days ago

Description

During query planning, the SortNode will be converted to a partial sort node, followed by gather exchange with ensureSourceOrdering to be true and only one single worker and thread do the final merge to make sure that we get the data sorted as specified.

However, for an internal feature we are developing, data to be sorted within each partition is enough, rather than globally. In order to achieve this, we need to plan the query so that we have the sort node working on partitioned data, and do not need the single threaded gathering exchange.

In this PR, I added a new field partitionedBy to the sort node, which specifies the scope for sort, with empty list to be global sort, which is the current behavior.

Motivation and Context

Described above

Impact

Enable sort within partitions

Test Plan

Add unit tests Since the partitioned by attributed will always be empty after parser, and only be set in optimizer, it has no change for current production.

Contributor checklist

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add a partition by attribute to specify the scope of sort node. :pr:`24095`
prestodb-ci commented 4 days ago

Saved that user @feilong-liu is from Meta

steveburnett commented 3 days ago

Thanks for the release note entry! A couple of nits.

== RELEASE NOTES ==

General Changes
* Add a partition by attribute to specify the scope of sort node. :pr:`24095`