trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.52k stars 3.03k forks source link

Project Hummingbird #14237

Open martint opened 2 years ago

martint commented 2 years ago

Trino has had a columnar/vectorized evaluation engine since its inception in 2012. After the initial implementation and optimization, and once we were satisfied with the performance for the majority of the use cases, we focused our efforts in other areas. Although we've made further incremental performance improvements in the past few years, there is still room for further optimization.

We're starting Project Hummingbird with the goal of bringing Trino's columnar/vectorized evaluation engine to the next level. This includes improvements in areas such as filter, projection, aggregation and join evaluation, as well as any other potential improvements in areas we identify along the way. So far, we have the following list:

Tasks

sopel39 commented 2 years ago

Optimized storage of GROUP BY intermediates for fixed-size types to improve memory locality and avoid multiple indirections.

There already is https://github.com/trinodb/trino/pull/10706, but IMO we should focus on variable length types since multi channel aggregations often use varchars.

cc @lukasz-stec

WinkerDu commented 2 years ago

I think more contributors can involve in this project if there is some specific opening issues

mosabua commented 2 years ago

Also ping us on the project-hummingbird slack channel https://trinodb.slack.com/archives/C04APR44U20

lukasz-stec commented 2 years ago

"Megamorphism and virtual dispatch in core loops due to call sites seeing multitude of block types"

I have seen degradation in HashBuilderOperator performance due to this when DictionaryBlocks were pushed through the partitioned exchange (more details https://github.com/trinodb/trino/issues/15216)

chaojun-zhang commented 1 year ago

Suboptimal code generation for complex expressions and required null checks

Is there any issue to further clarify for this topic then we can follow? Are we still working on code generation based on airlift/bytecode library and improve it?

Introduce abstractions and batch calling conventions to facilitate the implementation of functions and operators that can leverage SIMD instructions via Java's new Vector API, and, in the future, possibly GPUs via OpenCL or CUDA

Besides to the SIMD instructions, can we consider introducing operator and expression evaluation framework based on a native JIT engine (such as code generation through LLVM)?

Basically, we have two options: option 1- improve performance on pipeline level: when a physical pipeline operators hands over to a Trino worker, we can first rewrite the Trino physicals plan into substrait based plan, then compile the substrait plan into IR code through LLVM API, and execute the generated IR code by given trino page input to get the results as arrow data formats, finally convert the arrow result back to trino page format.

option 2: improve performance on expression and operator level: when operator::getOutput() was invoked, then forward the request to native based operator call through JNI, and the native operator is optimized based on the IR code(LLVM IR).

martint commented 1 year ago

expression evaluation framework based on a native JIT engine (such as code generation through LLVM)

Trino already does that via the JVM's JIT compiler. It produces JVM bytecode that the JVM then turns into native CPU instructions.

hackeryang commented 5 months ago

Recent PR: https://github.com/trinodb/trino/pull/21465