prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.89k stars 5.32k forks source link

Improve Projections #15259

Open yingsu00 opened 3 years ago

yingsu00 commented 3 years ago

Projections was one of the top contributor to memory allocations that causes full GCs and OOMs. It also has space to improve in terms of CPU usage. We propose the following changes to optimize projections:

Pre-allocate enough memory

Optimize BlockBuilder element insertions

Create DictionaryBlock for identity projections

Optimize aggregation functions https://github.com/prestodb/presto/issues/15361

yingsu00 commented 3 years ago

cc @bhhari @mbasmanova @sujay-jain @biswapesh

yingsu00 commented 3 years ago

Add identity projection with partial filter failure rate to BenchmarkPageProcessor https://github.com/prestodb/presto/pull/15266

yingsu00 commented 3 years ago

https://github.com/prestodb/presto/pull/15272 is out: Identity projection improvement when the selectedPositions is a list BenchmarkPageProcessor shows 21x improvment in throughput and 10x reduction in memory allocation and GC

yingsu00 commented 3 years ago

pnb1_batch1_worker_cpu_20201011_012349_60s.svg.zip Profile shows some functions e.g. mapConact is the major CPU consumer for projections. These functions use TypedSet and were very inefficient in using PageBUilder and BlockBuilders. So it seems optimizing PageBuilder and BlockBuilders has a very wide benefit.

yingsu00 commented 3 years ago

https://github.com/prestodb/presto/pull/15301 is out: Disable MergingPageOutput for ScanFilterAndProjectOperator before repartitioning. Simple query on TPCH 100GB shows 22% CPU reduction: