trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.55k stars 3.03k forks source link

Enabling JMX crashes trino #24302

Closed sarthak-autodesk closed 6 days ago

sarthak-autodesk commented 6 days ago

I am running Trino (426) on kubernetes using the trino helm chart (https://trinodb.github.io/charts/charts/trino/)

When I enable JMX using:

jmx:
  enabled: true

Trino starts crashing for me with the following error:

2024-11-28T17:00:42.327581841Z Exception in thread "main" java.lang.ClassCircularityError: java/lang/WeakPairMap$Pair$Weak
2024-11-28T17:00:42.327629412Z  at java.base/java.lang.WeakPairMap$Pair.weak(WeakPairMap.java:201)
2024-11-28T17:00:42.327659862Z  at java.base/java.lang.WeakPairMap.putIfAbsent(WeakPairMap.java:123)
2024-11-28T17:00:42.327672932Z  at java.base/java.lang.Module.implAddReads(Module.java:488)
2024-11-28T17:00:42.327680732Z  at java.base/java.lang.Module.implAddReads(Module.java:449)
2024-11-28T17:00:42.327813234Z  at java.base/java.lang.System$2.addReads(System.java:2335)
2024-11-28T17:00:42.327823604Z  at java.base/jdk.internal.module.Modules.addReads(Modules.java:90)
2024-11-28T17:00:42.327830174Z  at java.base/java.lang.reflect.Proxy$ProxyBuilder.lambda$getDynamicModule$1(Proxy.java:918)
2024-11-28T17:00:42.327886925Z  at java.base/jdk.internal.loader.AbstractClassLoaderValue$Memoizer.get(AbstractClassLoaderValue.java:329)
2024-11-28T17:00:42.327899025Z  at java.base/jdk.internal.loader.AbstractClassLoaderValue.computeIfAbsent(AbstractClassLoaderValue.java:205)
2024-11-28T17:00:42.327964595Z  at java.base/java.lang.reflect.Proxy$ProxyBuilder.getDynamicModule(Proxy.java:908)
2024-11-28T17:00:42.327973275Z  at java.base/java.lang.reflect.Proxy$ProxyBuilder.mapToModule(Proxy.java:846)
2024-11-28T17:00:42.328026236Z  at java.base/java.lang.reflect.Proxy$ProxyBuilder.<init>(Proxy.java:651)
2024-11-28T17:00:42.328057746Z  at java.base/java.lang.reflect.Proxy$ProxyBuilder.<init>(Proxy.java:656)
2024-11-28T17:00:42.328125477Z  at java.base/java.lang.reflect.Proxy.lambda$getProxyConstructor$0(Proxy.java:429)
2024-11-28T17:00:42.328134757Z  at java.base/jdk.internal.loader.AbstractClassLoaderValue$Memoizer.get(AbstractClassLoaderValue.java:329)
2024-11-28T17:00:42.328152477Z  at java.base/jdk.internal.loader.AbstractClassLoaderValue.computeIfAbsent(AbstractClassLoaderValue.java:205)
2024-11-28T17:00:42.328174517Z  at java.base/java.lang.reflect.Proxy.getProxyConstructor(Proxy.java:427)
2024-11-28T17:00:42.328277029Z  at java.base/java.lang.reflect.Proxy.newProxyInstance(Proxy.java:1037)
2024-11-28T17:00:42.328285799Z  at java.base/sun.reflect.annotation.AnnotationParser$1.run(AnnotationParser.java:302)
2024-11-28T17:00:42.328307009Z  at java.base/sun.reflect.annotation.AnnotationParser$1.run(AnnotationParser.java:300)
2024-11-28T17:00:42.328331509Z  at java.base/java.security.AccessController.doPrivileged(AccessController.java:318)
2024-11-28T17:00:42.328344899Z  at java.base/sun.reflect.annotation.AnnotationParser.annotationForMap(AnnotationParser.java:300)
2024-11-28T17:00:42.328442020Z  at java.base/sun.reflect.annotation.AnnotationParser.parseAnnotation2(AnnotationParser.java:289)
2024-11-28T17:00:42.328450520Z  at java.base/sun.reflect.annotation.AnnotationParser.parseAnnotations2(AnnotationParser.java:121)
2024-11-28T17:00:42.328527931Z  at java.base/sun.reflect.annotation.AnnotationParser.parseSelectAnnotations(AnnotationParser.java:102)
2024-11-28T17:00:42.328536061Z  at java.base/sun.reflect.annotation.AnnotationType.<init>(AnnotationType.java:146)
2024-11-28T17:00:42.328610722Z  at java.base/sun.reflect.annotation.AnnotationType.getInstance(AnnotationType.java:85)
2024-11-28T17:00:42.328619422Z  at java.base/sun.reflect.annotation.AnnotationParser.parseAnnotation2(AnnotationParser.java:262)
2024-11-28T17:00:42.328691223Z  at java.base/sun.reflect.annotation.AnnotationParser.parseAnnotations2(AnnotationParser.java:121)
2024-11-28T17:00:42.328699873Z  at java.base/sun.reflect.annotation.AnnotationParser.parseAnnotations(AnnotationParser.java:73)
2024-11-28T17:00:42.328795464Z  at java.base/java.lang.Class.createAnnotationData(Class.java:4068)
2024-11-28T17:00:42.328803774Z  at java.base/java.lang.Class.annotationData(Class.java:4057)
2024-11-28T17:00:42.328809344Z  at java.base/java.lang.Class.getAnnotation(Class.java:3940)
2024-11-28T17:00:42.328856134Z  at org.openjdk.jol.vm.ContendedSupport.getJdkInternalContended(ContendedSupport.java:90)
2024-11-28T17:00:42.328890765Z  at org.openjdk.jol.vm.ContendedSupport.isContended(ContendedSupport.java:61)
2024-11-28T17:00:42.328938786Z  at org.openjdk.jol.info.ClassData.parse(ClassData.java:111)
2024-11-28T17:00:42.329070677Z  at org.openjdk.jol.info.ClassData.parseClass(ClassData.java:63)
2024-11-28T17:00:42.329078637Z  at org.openjdk.jol.info.ClassLayout.parseClass(ClassLayout.java:72)
2024-11-28T17:00:42.329243858Z  at org.openjdk.jol.info.ClassLayout.parseClass(ClassLayout.java:56)
2024-11-28T17:00:42.329250868Z  at io.airlift.slice.SizeOf.instanceSize(SizeOf.java:308)
2024-11-28T17:00:42.329303620Z  at io.airlift.slice.SizeOf.<clinit>(SizeOf.java:64)
2024-11-28T17:00:42.329310180Z  at io.airlift.slice.Slice.<clinit>(Slice.java:52)
2024-11-28T17:00:42.329481671Z  at io.airlift.slice.Slices.<clinit>(Slices.java:31)
2024-11-28T17:00:42.329489131Z  at io.trino.server.TrinoSystemRequirements.verifySlice(TrinoSystemRequirements.java:146)
2024-11-28T17:00:42.329517421Z  at io.trino.server.TrinoSystemRequirements.verifyJvmRequirements(TrinoSystemRequirements.java:52)
2024-11-28T17:00:42.329534022Z  at io.trino.server.Server.doStart(Server.java:96)
2024-11-28T17:00:42.329551082Z  at io.trino.server.Server.lambda$start$0(Server.java:91)
2024-11-28T17:00:42.329652084Z  at io.trino.$gen.Trino_426____20241128_170042_1.run(Unknown Source)
2024-11-28T17:00:42.329659834Z  at io.trino.server.Server.start(Server.java:91)
2024-11-28T17:00:42.329736074Z  at io.trino.server.TrinoServer.main(TrinoServer.java:38)

This is my values.yaml for the helm chart

nameOverride: name
securityContext: null
serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: redacted
imagePullSecrets: []
image:
  repository: redacted
  pullPolicy: Always
  tag: base-presto-latest
server:
  config:
    path: /trino_config
  autoscaling:
    enabled: true
    maxReplicas: 1
    targetCPUUtilizationPercentage: 80
    targetMemoryUtilizationPercentage: 80
  workers: 1
  node:
    environment: "development"
additionalConfigProperties:
  - sql.default-schema=default
  - sql.default-catalog=hive
  - enable-dynamic-filtering=true
  - iterative-optimizer-timeout=5.00m
  - join-distribution-type=AUTOMATIC
  - max-spill-per-node=100GB
  - optimizer.default-filter-factor-enabled=true
  - optimizer.dictionary-aggregation=true
  - optimizer.join-reordering-strategy=AUTOMATIC
  - optimizer.optimize-metadata-queries=true
  - optimizer.optimize-mixed-distinct-aggregations=true
  - query.max-execution-time=30m
  - query.max-memory=100TB
  - query.max-memory-per-node=125GB
  - query-max-spill-per-node=100GB
  - query.max-stage-count=400
  - query.min-expire-age=60m
  - spill-compression-enabled=true
  - spill-enabled=true
  - http-server.authentication.type=PASSWORD
  - http-server.process-forwarded=true
  - internal-communication.shared-secret="rwUm9cWVDkec3vm2efSlasmjoeiIobre0vfcYmd7y5sBrE8SRhVM9Qp5ID4J8p1JM79ROKtykM5YB7c6nmWMo4AlpQbVxLHHoVrRTGa6H6RopE3mU0ED6R7TMgUBmwNgtZugwG+psiA69QBOWROXxVIjIfdxeFX3ZXgCUeRNbWSZkM//eGCM4o2cwQSXLKdvFUQXnZ8hUL0q9Kufqs7OEm5SUAMTNoHRA7feDkqDPgeFq4Zx5JSiR+yYpYcf2ZW6u5SBuMwf4pSSRZcE629bUxHE8LaBES8J3ctnV3FFIXY9aK9Y6h/wLs0PqmGY3DJCtfd+lh+7JxlG1sqDoKg06/nGMSmx2GgfmSWx1VQ366eVFxYiTLQqEM4LovQD9cL8AMOdxxW5K8CTLu5qp2sgouNboiM/7cgOHAAQ0lKJepkO2KTI+63Xqns0szG/8kUVHgYGaRwBKbcLui+b6TJdWMi6IPfq1hH71onBlRGTX8Th8vnb85wUPC0QrD0RPnYrQeLEzdFJ6wH2uUFcsvrXX+YeqWlL6SvGM2PxiMsUSPrjzIR1JCfsfYQ1Ab1AUKdh7ur23glRnj1T0b+3eBrMIXw682FHUIXSisZI1b07kCtc5N1pawtKklRH8gbnoUdy8n8l3waN3mMdSKHWoV9H3bnRChGj15JkunLlFk7c+gk=" # necessary for authentication
  - spiller-spill-path="/mnt/spill_dir"

worker:
  annotations:
    release_time: "7"
  resources:
    limits:
      cpu: "29"
      memory: "240Gi"
    requests:
      cpu: "29"
      memory: "240Gi"
  jvm:
    maxHeapSize: "192G"
coordinator:
  annotations:
    release_time: "7"
  resources:
    limits:
      cpu: "29"
      memory: "240Gi"
    requests:
      cpu: "29"
      memory: "240Gi"
  jvm:
    maxHeapSize: "192G"
jmx:
  enabled: true
nineinchnick commented 6 days ago

The stack trace says you're running Trino 426, but in your values.yaml you have tag: base-presto-latest - is that some custom build? Can you try a newer version?

sarthak-autodesk commented 6 days ago

Yes, I am running a custom docker version. I think I have narrowed it to my JVM config and it has nothing to do with trino. The following config works fine but when I uncomment -javaagent:/usr/lib/trino/plugin/privacera-ranger/ranger-trino-plugin-agent.jar it starts to break.

-server
-agentpath:/usr/lib/trino/bin/libjvmkill.so
-XX:InitialRAMPercentage=80
-XX:MaxRAMPercentage=80
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:ReservedCodeCacheSize=256M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000
# Improve AES performance for S3, etc. on ARM64 (JDK-8271567)
-XX:+UnlockDiagnosticVMOptions
-XX:+UseAESCTRIntrinsics
# Disable Preventive GC for performance reasons (JDK-8293861)
-XX:-G1UsePreventiveGC
#-javaagent:/usr/lib/trino/plugin/privacera-ranger/ranger-trino-plugin-agent.jar
#-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9080
-Dcom.sun.management.jmxremote.rmi.port=9081
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=127.0.0.1
mosabua commented 6 days ago

I would suggest to upgrade to Trino 466 and use the official Ranger plugin available in that release. Or talk to Privacera and get them to fix the issue.