Open youngjinkim0706 opened 4 months ago
Trino is designed to fully utilize CPU when there is enough parallelism in the workload, it usually doesn't lead to communication threads getting starved. Was there full GC or a crash of the worker ?
Which release of Trino is this ? Could you check if disabling experimental.thread-per-driver-scheduler-enabled
makes a difference ?
Resource groups only provide admission control, it does not affect resource usage of running queries.
If this happens under high concurrency, you can look at configuration a lower limit on query.max-concurrent-queries
. If it's due to specific queries being resource heavy, then tuning task.max-drivers-per-task
lower may help.
Thank you for your response, @raunaqmorarka. I'm going to check the parameters which you let me know.
FYI, I executed queries on both Trino 403 and 450 versions, the 503 error occured in both trino release. And these queries were run sequentially, meaning only a single query was executed in the Trino cluster at any given time.
There was a Major GC, not a Full GC, and the worker container was restarted. I suspect this occurred because an exception was thrown in the execute function of HiveSplitManager.java:
public void execute(Runnable command)
{
try {
delegate.execute(command);
}
catch (RejectedExecutionException e) {
throw new TrinoException(SERVER_SHUTTING_DOWN, "Server is shutting down", e);
}
}
Additionally, when I set the task.max-worker-threads
parameter lower than the CPU resources allocated to the pod and executed the query, the query ran successfully.
What I am curious about is, if the value of max-worker-threads
is high, it seems to occupy resources meant for communication, leading to the error that occurred earlier. Is there a way at the Trino configuration level to prioritize or allocate static resources specifically for communication threads? If not, can the CPU usage priority for threads be set in the JVM?
There was a Major GC, not a Full GC, and the worker container was restarted
You should look into why the worker was restarted. You might need to reduce JVM heap size or increase memory.heap-headroom-per-node
if the pod was oom killed by k8s or the JVM crashed on OOM.
Can't get trino.execution.executor:name=TaskExecutor.RunningSplits
from http://worker:9300/v1/jmx/mbean
I am not sure that your case is same as mine. but in my situation, the issue was linked to the default enablement of parquet.experimental.vectorized-decoding
(introduced in v448) on worker containers running on AWS Graviton 2&3 (arm64) nodes.
If you are deploying Trino on these instance types, it could potentially be relevant to your scenario.
I executed queries on both Trino 403 and 450 versions, the 503 error occured in both trino release
Based on that, it's probably not related to recent changes
I found this issue was reproduced even with lower task.max-worker-threads
configuration.
In my case, this issue was just another side-effect of thread-per-driver scheduler, reported on the issue below.
Experimental config. experimental.thread-per-driver-scheduler-enabled
was enabled by default since v438
When I disabled it, then this issue was removed..! I guess this is helpful also on your case :)
Can't get
trino.execution.executor:name=TaskExecutor.RunningSplits
fromhttp://worker:9300/v1/jmx/mbean
I found that when experimental.thread-per-driver-scheduler-enabled=true, it could not be obtained normally "Trino. Execution. Executor: name = TaskExecutor. RunningSplits" indicators, You need to set "experiment.thread-per-driver-scheduler-enabled" to "false"
When setting the number of max-worker-threads to cpus*2 and executing queries, CPU usage nearly reaches 100%, leading to resource starvation for the thread responsible for communication between worker and coordinator, resulting in a 503 error. This issue persists even when using the hardCpuLimit in the resource-group config. What are the best practices to resolve this issue? Alternatively, can this problem be solved by modifying other configurations?
here is error message