trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.5k stars 3.02k forks source link

Running query is much less then hardConcurrencyLimit #17610

Closed njalan closed 1 year ago

njalan commented 1 year ago

Trino version is 406. I am running performance testing by Jemeter on one simple query on one cluster with 18 workers. Below is my query :SELECT "td" AS "td", date_trunc('day', CAST(due_day_local AS TIMESTAMP)) AS "due_day_local", count(DISTINCT "so_id") AS "COUNT_DISTINCT(so_id)" FROM xxxx WHERE "due_day" >= from_iso8601_date('2023-05-23') AND "due_day" <= from_iso8601_date('2023-05-30') AND "td_date" IS null AND "td_location" = 'xxxxe' GROUP BY "td", date_trunc('day', CAST(due_day_local AS TIMESTAMP)) ORDER BY "COUNT_DISTINCT(so_id)" DESC LIMIT 10000 (edited) 11:27 Below is config:coordinator=true node-scheduler.include-coordinator=false node-scheduler.max-splits-per-node=200 node-scheduler.max-pending-splits-per-task=20 query.max-stage-count=400 query.max-length=65432 query.stage-count-warning-threshold=400 query.max-memory=120GB query.max-memory-per-node=10GB exchange.http-client.request-timeout=120s exchange.client-threads=40 query.max-run-time=400s scheduler.http-client.max-requests-queued-per-destination=4096 query.max-history=200 query.min-expire-age=30m http-server.log.max-size=67108864B http-server.log.max-history=5 11:27 Resource group for this user is { "name": "xxxx", "softMemoryLimit": "95%", "hardConcurrencyLimit":50, "schedulingWeight": 2000, "maxQueued": 220 }, 11:29 why only like 4 queries are running and many queries are in queue. I found that Analysis Time and Planning Time are like 2-3 seconds.

Screen Shot 2023-05-23 at 11 19 59 PM

How to improve the performance on it? is there any thing wrong with my config? is it possible to reduce the Analysis Time and Planning Time. By the way I am running the hudi query by Hudi connector

njalan commented 1 year ago

We can see lots of time is spend on scheduling and waiting

image
njalan commented 1 year ago

I am using hive connector to access hudi

Anubisxcw commented 4 months ago

i have a same problem