prometheus / jmx_exporter

A process for exposing JMX Beans via HTTP for Prometheus consumption
Apache License 2.0
3.03k stars 1.2k forks source link

Exception in thread "Keep-Alive-Timer" java.lang.OutOfMemoryError: Java heap space #378

Closed Drewster727 closed 5 years ago

Drewster727 commented 5 years ago

Hey guys--

Looking to use the jmx_exporter to pull kafka metrics. However, I'm having issues starting kafka after I setup my config, and I don't quite see a smoking gun.

jmx_prometheus_javaagent-0.11.0.jar kafka v0.10.1

Error in /var/log/kafka/kafka.err upon starting:

Exception in thread "Keep-Alive-Timer" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Keep-Alive-Timer" java.lang.OutOfMemoryError: Java heap space
log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.DailyRollingFileAppender.
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.DailyRollingFileAppender.
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ExpirationReaper-1001"
Exception in thread "ExpirationReaper-1001" java.lang.OutOfMemoryError: Java heap space

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main-SendThread(server_name_redacted.domain.com:2181)"
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
        at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.lang.IllegalArgumentException: Collector already registered that provides name: jmx_exporter_build_info
        at io.prometheus.jmx.shaded.io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:54)
        at io.prometheus.jmx.shaded.io.prometheus.client.Collector.register(Collector.java:139)
        at io.prometheus.jmx.shaded.io.prometheus.client.Collector.register(Collector.java:132)
        at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:51)
        ... 6 more

We're using ambari/zookeeper to manage/broker our kafka nodes. Here is what I'm setting in the kafka-env template in order to set the KAFKA_OPTS:

export KAFKA_OPTS="$KAFKA_OPTS -javaagent:/opt/scripts/prometheus/jmx_prometheus_javaagent-0.11.0.jar=8081:/opt/scripts/prometheus/kafkaMon.yaml"

From the ambari kafka service start logs (my export KAFKA_OPTS is in kafka-env.sh):

source /usr/hdp/current/kafka-broker/config/kafka-env.sh ; /usr/hdp/current/kafka-broker/bin/kafka start

My kafkaMon.yaml (config file) -- pulled from example_configs/kafka_0_8_2.yaml:

lowercaseOutputName: true
rules:
- pattern : kafka.cluster<type=(.+), name=(.+), topic=(.+), partition=(.+)><>Value
  name: kafka_cluster_$1_$2
  labels:
    topic: "$3"
    partition: "$4"
- pattern : kafka.log<type=Log, name=(.+), topic=(.+), partition=(.+)><>Value
  name: kafka_log_$1
  labels:
    topic: "$2"
    partition: "$3"
- pattern : kafka.controller<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_controller_$1_$2
- pattern : kafka.network<type=(.+), name=(.+)><>Value
  name: kafka_network_$1_$2
- pattern : kafka.network<type=(.+), name=(.+)PerSec, request=(.+)><>Count
  name: kafka_network_$1_$2_total
  labels:
    request: "$3"
- pattern : kafka.network<type=(.+), name=(\w+), networkProcessor=(.+)><>Count
  name: kafka_network_$1_$2
  labels:
    request: "$3"
  type: COUNTER
- pattern : kafka.network<type=(.+), name=(\w+), request=(\w+)><>Count
  name: kafka_network_$1_$2
  labels:
    request: "$3"
- pattern : kafka.network<type=(.+), name=(\w+)><>Count
  name: kafka_network_$1_$2
- pattern : kafka.server<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
  name: kafka_server_$1_$2_total
  labels:
    topic: "$3"
- pattern : kafka.server<type=(.+), name=(.+)PerSec\w*><>Count
  name: kafka_server_$1_$2_total
  type: COUNTER

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
    topic: "$4"
    partition: "$5"
- pattern : kafka.server<type=(.+), name=(.+), topic=(.+), partition=(.*)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    topic: "$3"
    partition: "$4"
- pattern : kafka.server<type=(.+), name=(.+), topic=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    topic: "$3"
  type: COUNTER

- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
    broker: "$4:$5"
- pattern : kafka.server<type=(.+), name=(.+), clientId=(.+)><>(Count|Value)
  name: kafka_server_$1_$2
  labels:
    clientId: "$3"
- pattern : kafka.server<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_server_$1_$2

- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
  name: kafka_$1_$2_$3_total
- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+)><>Count
  name: kafka_$1_$2_$3_total
  labels:
    topic: "$4"
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, topic=(.+), partition=(.+)><>Count
  name: kafka_$1_$2_$3_total
  labels:
    topic: "$4"
    partition: "$5"
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+)><>(Count|Value)
  name: kafka_$1_$2_$3_$4
  type: COUNTER
- pattern : kafka.(\w+)<type=(.+), name=(.+), (\w+)=(.+)><>(Count|Value)
  name: kafka_$1_$2_$3_$6
  labels:
    "$4": "$5"

The kafka services fail to start. Path-ing and permissions looks fine. From the logs, it looks as though the javaagent is actually starting/working, but kafka services ultimately fail to start up.

Can anyone tell me what I can do to diagnose or resolve?

Thanks, Drew

Drewster727 commented 5 years ago

Update: Digging around I found KAFKA_HEAD_OPTS being set via kafka-run-class.sh:

# Memory options
if [ -z "$KAFKA_HEAP_OPTS" ]; then
  KAFKA_HEAP_OPTS="-Xmx256M"
fi

I updated my environment vars to:

export KAFKA_OPTS="$KAFKA_OPTS -javaagent:/opt/scripts/prometheus/jmx_prometheus_javaagent-0.11.0.jar=8081:/opt/scripts/prometheus/kafkaMon.yaml"

export KAFKA_HEAD_OPTS="-Xms512m -Xmx1g"

That does resolve the OOM errors, but still getting:

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
        at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.lang.IllegalArgumentException: Collector already registered that provides name: jmx_exporter_build_info
        at io.prometheus.jmx.shaded.io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:54)
        at io.prometheus.jmx.shaded.io.prometheus.client.Collector.register(Collector.java:139)
        at io.prometheus.jmx.shaded.io.prometheus.client.Collector.register(Collector.java:132)
        at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:51)
        ... 6 more

I found a separate post indicating the collector has already been registered. Why would it be stating this? I'm only executing/starting it once, and only one javaagent.

Drewster727 commented 5 years ago

Update #2: This fixed it for me:

export KAFKA_OPTS="-javaagent:/opt/scripts/prometheus/jmx_prometheus_javaagent-0.11.0.jar=8081:/opt/scripts/prometheus/kafkaMon.yaml"
export KAFKA_HEAD_OPTS="-Xms512m -Xmx1g"

note that I removed the $KAFKA_OPTS from the beginning... it was causing the javaagent to get registered twice.

Hope this helps someone!

-Drew

geopamplona commented 5 years ago

i have the same error and same solution (iam using 0.12.1 version), but i have problems when access zookeeper console or other tools that interact with kafka broker.. imagen do you solve this problem definitely? should i use KAFKA_JMX_OPTS?