cluster becomes slower and slower

ImTangYun commented 3 years ago

From web ui we see that many query ware running, but work parallelism almost 0, and cpu usage is very low

slow

After we restart cluster, the cluster performs quit well, but slow down quikly after about 1 to 2hours, do you know why? does presto need to be restarted frequently at facebook?

Some important info: Our querys are quit big, many querys scan 5+TB physical data. We see that the cluster slow down slower when querys are small

fast

the key configs:

we had clusters with about 41 worker nodes, the hardware is:

cpu with 96 cores

512GB memory

1 ssd

10 Gigabit Ethernet

presto version: 332 hive connector with data stored at hdfs

jvm： -server -Xms450G -Xmx450G -Xss8M -XX:+UseG1GC -XX:G1HeapWastePercent=5 -XX:+ParallelRefProcEnabled -XX:ParallelGCThreads=48 -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p -DHADOOP_USER_NAME=hdfs -Dpresto-temporarily-allow-java8=true -XX:PerMethodRecompilationCutoff=10000 -XX:PerBytecodeRecompilationCutoff=10000 -XX:ReservedCodeCacheSize=2G -XX:+UseCodeCacheFlushing -XX:NativeMemoryTracking=detail -XX:+PrintCompilation -XX:+CITime -XX:+PrintCodeCache -Djdk.nio.maxCachedBufferSize=4000000 -Djdk.attach.allowAttachSelf=true -XX:G1HeapRegionSize=32M

presto config,properties: query.max-memory=2500GB query.max-history=3000 experimental.reserved-pool-disabled=true http-server.log.max-size=100MB http-server.http.port=8286 log.max-size=100MB node-scheduler.include-coordinator=false node-scheduler.max-splits-increment-for-caching=300 query.low-memory-killer.delay=1m http-server.accept-queue-size=16000 distributed-sort=true exchange.http-client.idle-timeout=1m log.max-history=10 optimizer.enable-intermediate-aggregations=true query.low-memory-killer.policy=total-reservation-on-blocked-nodes task.concurrency=32 http-server.http.selector-threads=32 query.max-total-memory=8000GB task.max-worker-threads=100 join-distribution-type=AUTOMATIC node-scheduler.use-cacheable-white-list=true query.client.timeout=5m query.max-memory-per-node=200GB optimizer.default-filter-factor-enabled=true exchange.compression-enabled=true task.max-leaf-splits-per-node=50 node-scheduler.max-splits-per-node=100 http-server.threads.max=500 query.max-total-memory-per-node=256GB join-max-broadcast-table-size=2GB http-server.http.acceptor-threads=32 writer-min-size=128MB http-server.threads.min=50 discovery.uri=http://master:8000 optimizer.join-reordering-strategy=AUTOMATIC http-server.log.max-history=10 memory.heap-headroom-per-node=48GB optimizer.optimize-mixed-distinct-aggregations=true optimizer.use-mark-distinct=true query.max-length=600000 coordinator=false

many worker threads stuck at: stack1

java.lang.Thread.State: RUNNABLE at jdk.internal.misc.Unsafe.defineAnonymousClass0(java.base@11.0.8/Native Method) at jdk.internal.misc.Unsafe.defineAnonymousClass(java.base@11.0.8/Unsafe.java:1225) at java.lang.invoke.InvokerBytecodeGenerator.loadAndInitializeInvokerClass(java.base@11.0.8/InvokerBytecodeGenerator.java:295) at java.lang.invoke.InvokerBytecodeGenerator.loadMethod(java.base@11.0.8/InvokerBytecodeGenerator.java:287) at java.lang.invoke.InvokerBytecodeGenerator.generateCustomizedCode(java.base@11.0.8/InvokerBytecodeGenerator.java:693) at java.lang.invoke.LambdaForm.compileToBytecode(java.base@11.0.8/LambdaForm.java:871) at java.lang.invoke.LambdaForm.customize(java.base@11.0.8/LambdaForm.java:506) at java.lang.invoke.MethodHandle.customize(java.base@11.0.8/MethodHandle.java:1675) at java.lang.invoke.Invokers.maybeCustomize(java.base@11.0.8/Invokers.java:582) at java.lang.invoke.Invokers.checkCustomized(java.base@11.0.8/Invokers.java:573) at java.lang.invoke.Invokers$Holder.invoke_MT(java.base@11.0.8/Invokers$Holder) at io.prestosql.operator.project.GeneratedPageProjection.project(GeneratedPageProjection.java:76) at io.prestosql.operator.project.PageProcessor$ProjectSelectedPositions.processBatch(PageProcessor.java:330) at io.prestosql.operator.project.PageProcessor$ProjectSelectedPositions.process(PageProcessor.java:205) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372) at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:277) at io.prestosql.operator.WorkProcessorUtils$$Lambda$3039/0x00007ebdf82cb840.process(Unknown Source) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372) at io.prestosql.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221) at io.prestosql.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200) at io.prestosql.operator.WorkProcessorUtils$$Lambda$3090/0x00007ebdf835d8b0.process(Unknown Source) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372) at io.prestosql.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:277) at io.prestosql.operator.WorkProcessorUtils$$Lambda$3039/0x00007ebdf82cb840.process(Unknown Source) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319) at io.prestosql.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372) at io.prestosql.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)

Flame graph

tooptoop4 commented 3 years ago

similar to https://github.com/prestodb/presto/issues/11952#issuecomment-734006863 restarting workers every 5 days seems to solve for me

ImTangYun commented 3 years ago

similar to prestodb/presto#11952 (comment) restarting workers every 5 days seems to solve for me

Our clusters become slow only after 1~2hours, and restart every 1hour is not a good choice, do you have any ideas for the core cause of slowing?

tooptoop4 commented 3 years ago

@ImTangYun can u take some diagnostic dumps like jmap/jstack, 10 minutes after restart vs 4 hours after restart and compare them? My guess is some memory leak

ImTangYun commented 3 years ago

@ImTangYun can u take some diagnostic dumps like jmap/jstack, 10 minutes after restart vs 4 hours after restart and compare them? My guess is some memory leak

Sounds a good way to find the problem, i'll try it these days,thanks

shuai-xu commented 3 years ago

Is there anybody know what is 'io.prestosql.operator.project.GeneratedPageProjection.project(GeneratedPageProjection.java)' doing? Why the stack always shows the thread is hanging here when worker become slow?

tooptoop4 commented 3 years ago

@yingsu00

findepi commented 3 years ago

Is there anybody know what is 'io.prestosql.operator.project.GeneratedPageProjection.project(GeneratedPageProjection.java)' doing? Why the stack always shows the thread is hanging here when worker become slow?

(also reported as https://github.com/prestosql/presto/issues/6435)

zhanglistar commented 3 years ago

Any updates? @findepi

sopel39 commented 3 years ago

@ImTangYun

Could you set JVM properties PerMethodRecompilationCutoff=10000 and PerBytecodeRecompilationCutoff=10000 and report if it did help regression issue?

zhanglistar commented 3 years ago

@ImTangYun

Could you set JVM properties PerMethodRecompilationCutoff=10000 and PerBytecodeRecompilationCutoff=10000 and report if it did help regression issue?

Already set, see the config above. @sopel39

gjhkael commented 3 years ago

Our cluster had the same problem, we restart the cluster every two weeks. @ImTangYun do you have solved the problem?

ImTangYun commented 3 years ago

Our cluster had the same problem, we restart the cluster every two weeks. @ImTangYun do you have solved the problem?

No we restart the cluster every 2hours [破涕为笑]

sopel39 commented 2 years ago

@ImTangYun Is it possible to isolate this issue to a particular query? Or it degrades after you run mix of queries? Did you try newest Trino version?

wangli-td commented 2 years ago

Hi, guys is there any updates for this issue? Seems it happens sometimes. Some works still becomes slowly with below jvm parmeters alreay set. -XX:PerMethodRecompilationCutoff=10000 -XX:PerBytecodeRecompilationCutoff=10000

tooptoop4 commented 2 years ago

has anyone tested after https://github.com/trinodb/trino/pull/13064 fix?

zhumengzhu commented 11 months ago

Hi, guys is there any updates for this issue? Seems it happens sometimes. Some works still becomes slowly with below jvm parmeters alreay set. -XX:PerMethodRecompilationCutoff=10000 -XX:PerBytecodeRecompilationCutoff=10000

It seems this is an issue caused by JDK-8243615. You can see more details here.

The default Cutoff parameters are:

java -XX:+PrintFlagsFinal -version | grep Cutoff                                                                                                                                                        
     intx LiveNodeCountInliningCutoff              = 40000                                  {C2 product} {default}
     intx PerBytecodeRecompilationCutoff           = 200                                       {product} {default}
     intx PerMethodRecompilationCutoff             = 400                                       {product} {default}

So, IMO, tuning these parameters just delays the slow problem but does not solve it. Maybe the only way is to fix it in JDK.

trinodb / trino

cluster becomes slower and slower #6405