trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.33k stars 2.97k forks source link

Analysis and planning taking very long #17350

Closed exolab closed 1 year ago

exolab commented 1 year ago

We are comparing Trino 416 with Presto 347.

Our test query is a count on hive over a table with ~ 4mn rows.

The problem we are seeing is that Presto by far outperforms Trino.

# Presto 347:
Elapsed Time | 6.42s
-- | --
Queued Time | 591.87us
Analysis Time | 341.84ms
Planning Time | 355.72ms
Execution Time | 6.08s
# Trino 416:
Elapsed Time | 9.28s
-- | --
Queued Time | 307.68us
Analysis Time | 3.31s
Planning Time | 2.65s
Execution Time | 5.97s

We have tried finding the reason for the problem in our configuration, but have been unable to do so.

Does anyone have an idea what the issue might be? Why is Trino taking so long to do analysis and planning?

Any pointers are appreciated.

raunaqmorarka commented 1 year ago

Can you collect a JFR profile of the coordinator in Trino with this query run multiple times and share it ? cc: @radek-starburst

exolab commented 1 year ago

Thank you @raunaqmorarka. We actually identified the problem just now as a configuration problem on our end. Our hive deployment had no explicit resources specified in K8s. Once we changed that, performance went way up and is now better than with Presto.