trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.88k stars 2.86k forks source link

Elasticsearch connector aggregation push down support #7026

Open PChou opened 3 years ago

PChou commented 3 years ago

Hi team,

I am very excited to see that trino supports aggregation pushdown, because few SQL engines currently on the market support this feature. But I found that only a few connectors currently support it. We are trying to create a query platform based on trino. The data source includes elasticsearch, so we hope trino can support the aggregation pushdown of elasticsearch, which will greatly improve performance. Is this in the plan?

PChou commented 3 years ago

BTW, I'm trying to implement the feature recently.

PChou commented 3 years ago

The following simple test is based on an index of more than 40000 records. The difference in query efficiency between the two methods can be figured out.

trino:default> select hostname, avg("values") from elasticsearch.default.slmday60 group by hostname; hostname | _col1
---------------+------------------- 192.168.21.58 | 4992.663530635401 192.168.21.59 | 4989.727731732876 (2 rows)

Query 20210225_091409_00005_rb8ni, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0.53 [2 rows, 0B] [3 rows/s, 0B/s]

trino:default> set session elasticsearch.aggregation_pushdown_enabled=false; SET SESSION trino:default> select hostname, avg("values") from elasticsearch.default.slmday60 group by hostname; hostname | _col1
---------------+------------------- 192.168.21.58 | 4992.663530635401 192.168.21.59 | 4989.727731732876 (2 rows)

Query 20210225_091431_00007_rb8ni, FINISHED, 1 node Splits: 50 total, 50 done (100.00%) 2.80 [42.1K rows, 1.68MB] [15.1K rows/s, 617KB/s]

PChou commented 3 years ago

PR: https://github.com/trinodb/trino/pull/7131