prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.75k stars 5.28k forks source link

[native]Support no shuffle data copy to speedup query execution #23026

Closed xiaoxmeng closed 2 weeks ago

xiaoxmeng commented 2 weeks ago

Avoids the data buffer copy from iobufs allocated from proxygen in http response to velox memory pool. The copy is to prevent server OOM in case of unexpected spiky http shuffle memory usage from jemalloc. The copy transfers the memory usage to velox memory pool, and the latter is capped by velox memory management. Velox have improved the shuffle flow control in past and have better control on the shuffle memory usage for presto batch workloads. This PR makes a config option to disable the data copy to accelerate query execution. For Meta internal 1hr stress test, the overall query execution time has been reduced by ~15% and the walltime has been reduced by ~30%. The improvement comes from the reduced tail shuffle exchange latency. Both data exchange and data size exchange tail latency (P100) has been dropped from 2mins to 2s.

== NO RELEASE NOTE ==
facebook-github-bot commented 2 weeks ago

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xiaoxmeng commented 2 weeks ago

Please see above.

Updated.

xiaoxmeng commented 2 weeks ago

Thanks @xiaoxmeng. Only one more minor comment change.

Updated.