strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.88k stars 1.31k forks source link

Kafka Connect metrics not working #7171

Closed thiagorizzo closed 2 years ago

thiagorizzo commented 2 years ago

Describe the bug Kafka Connect metrics doesnt works

Error Log:

Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:513) at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:525) Caused by: java.io.FileNotFoundException: /opt/kafka/custom-config/metrics-config.yml (No such file or directory) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(FileInputStream.java:219) at java.base/java.io.FileInputStream.(FileInputStream.java:157) at java.base/java.io.FileReader.(FileReader.java:75) at io.prometheus.jmx.shaded.io.prometheus.jmx.JmxCollector.(JmxCollector.java:78) at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:29) ... 6 more java.lang.instrument ASSERTION FAILED : "result" with message agent load/premain call failed at ./src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422 FATAL ERROR in native method: processing of -javaagent failed, processJavaStart failed

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Create Custom Resource '....'
  3. Run command '....'
  4. See error

Expected behavior

Kafka up and running with metrics enabled

Environment (please complete the following information):

YAML files and logs

KafkaConnect:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
  name: data-connect-cluster
  annotations:
    strimzi.io/use-connector-resources: "true"
spec:
  version: 3.2.0
  replicas: 3
  image: myimage
  bootstrapServers: "bootstrapservers"
  authentication:
    type: plain
    username: admin
    passwordSecret:
      secretName: kafka-connect-secrets
      password: user-password  
  config:
    group.id: data-connect-cluster
    offset.storage.topic: connect-cluster-offsets
    config.storage.topic: connect-cluster-configs
    status.storage.topic: connect-cluster-status
    config.storage.replication.factor: 3
    offset.storage.replication.factor: 3
    status.storage.replication.factor: 3 
  template:
    pod:
      metadata:
        annotations:
          prometheus.io/scrape: 'true'
          prometheus.io/port: '9404'    
  metricsConfig:
    type: jmxPrometheusExporter
    valueFrom:
      configMapKeyRef:
        name: jmx-exporter-metrics-config
        key: metrics-config.yml

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: jmx-exporter-metrics-config
  labels:
    app: strimzi
data:
  metrics-config.yml: |
    # Inspired by kafka-connect rules
    # https://github.com/prometheus/jmx_exporter/blob/master/example_configs/kafka-connect.yml
    # See https://github.com/prometheus/jmx_exporter for more info about JMX Prometheus Exporter metrics
    lowercaseOutputName: true
    lowercaseOutputLabelNames: true
    rules:
    #kafka.connect:type=app-info,client-id="{clientid}"
    #kafka.consumer:type=app-info,client-id="{clientid}"
    #kafka.producer:type=app-info,client-id="{clientid}"
    - pattern: 'kafka.(.+)<type=app-info, client-id=(.+)><>start-time-ms'
      name: kafka_$1_start_time_seconds
      labels:
        clientId: "$2"
      help: "Kafka $1 JMX metric start time seconds"
      type: GAUGE
      valueFactor: 0.001
    - pattern: 'kafka.(.+)<type=app-info, client-id=(.+)><>(commit-id|version): (.+)'
      name: kafka_$1_$3_info
      value: 1
      labels:
        clientId: "$2"
        $3: "$4"
      help: "Kafka $1 JMX metric info version and commit-id"
      type: GAUGE
    #kafka.producer:type=producer-topic-metrics,client-id="{clientid}",topic="{topic}"", partition="{partition}"
    #kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}",topic="{topic}"", partition="{partition}"
    - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(.+-total|compression-rate|.+-avg|.+-replica|.+-lag|.+-lead)
      name: kafka_$2_$6
      labels:
        clientId: "$3"
        topic: "$4"
        partition: "$5"
      help: "Kafka $1 JMX metric type $2"
      type: GAUGE
    #kafka.producer:type=producer-topic-metrics,client-id="{clientid}",topic="{topic}"
    #kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}",topic="{topic}"", partition="{partition}"
    - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+)><>(.+-total|compression-rate|.+-avg)
      name: kafka_$2_$5
      labels:
        clientId: "$3"
        topic: "$4"
      help: "Kafka $1 JMX metric type $2"
      type: GAUGE
    #kafka.connect:type=connect-node-metrics,client-id="{clientid}",node-id="{nodeid}"
    #kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id="{nodeid}"
    - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), node-id=(.+)><>(.+-total|.+-avg)
      name: kafka_$2_$5
      labels:
        clientId: "$3"
        nodeId: "$4"
      help: "Kafka $1 JMX metric type $2"
      type: UNTYPED
    #kafka.connect:type=kafka-metrics-count,client-id="{clientid}"
    #kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}"
    #kafka.consumer:type=consumer-coordinator-metrics,client-id="{clientid}"
    #kafka.consumer:type=consumer-metrics,client-id="{clientid}"
    - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.*)><>(.+-total|.+-avg|.+-bytes|.+-count|.+-ratio|.+-age|.+-flight|.+-threads|.+-connectors|.+-tasks|.+-ago)
      name: kafka_$2_$4
      labels:
        clientId: "$3"
      help: "Kafka $1 JMX metric type $2"
      type: GAUGE
    #kafka.connect:type=connector-metrics,connector="{connector}"
    - pattern: 'kafka.(.+)<type=connector-metrics, connector=(.+)><>(connector-class|connector-type|connector-version|status): (.+)'
      name: kafka_connect_connector_$3
      value: 1
      labels:
        connector: "$2"
        $3: "$4"
      help: "Kafka Connect $3 JMX metric type connector"
      type: GAUGE
    #kafka.connect:type=connector-task-metrics,connector="{connector}",task="{task}<> status"
    - pattern: 'kafka.connect<type=connector-task-metrics, connector=(.+), task=(.+)><>status: ([a-z-]+)'
      name: kafka_connect_connector_task_status
      value: 1
      labels:
        connector: "$1"
        task: "$2"
        status: "$3"
      help: "Kafka Connect JMX Connector task status"
      type: GAUGE
    #kafka.connect:type=task-error-metrics,connector="{connector}",task="{task}"
    #kafka.connect:type=source-task-metrics,connector="{connector}",task="{task}"
    #kafka.connect:type=sink-task-metrics,connector="{connector}",task="{task}"
    #kafka.connect:type=connector-task-metrics,connector="{connector}",task="{task}"
    - pattern: kafka.connect<type=(.+)-metrics, connector=(.+), task=(.+)><>(.+-total|.+-count|.+-ms|.+-ratio|.+-seq-no|.+-rate|.+-max|.+-avg|.+-failures|.+-requests|.+-timestamp|.+-logged|.+-errors|.+-retries|.+-skipped)
      name: kafka_connect_$1_$4
      labels:
        connector: "$2"
        task: "$3"
      help: "Kafka Connect JMX metric type $1"
      type: GAUGE
    #kafka.connect:type=connector-metrics,connector="{connector}"
    #kafka.connect:type=connect-worker-metrics,connector="{connector}"
    - pattern: kafka.connect<type=connect-worker-metrics, connector=(.+)><>([a-z-]+)
      name: kafka_connect_worker_$2
      labels:
        connector: "$1"
      help: "Kafka Connect JMX metric $1"
      type: GAUGE
    #kafka.connect:type=connect-worker-metrics
    - pattern: kafka.connect<type=connect-worker-metrics><>([a-z-]+)
      name: kafka_connect_worker_$1
      help: "Kafka Connect JMX metric worker"
      type: GAUGE
    #kafka.connect:type=connect-worker-rebalance-metrics
    - pattern: kafka.connect<type=connect-worker-rebalance-metrics><>([a-z-]+)
      name: kafka_connect_worker_rebalance_$1
      help: "Kafka Connect JMX metric rebalance information"
      type: GAUGE
scholzj commented 2 years ago

Can you please share the full log from the Kafka Connect pods?

thiagorizzo commented 2 years ago

@scholzj It's the full log:

Preparing truststore
Preparing truststore is complete
Starting Kafka Connect with configuration:
# Bootstrap servers
bootstrap.servers=...
# REST Listeners
rest.port=8083
rest.advertised.host.name=10.162.128.210
rest.advertised.port=8083
# Plugins
plugin.path=/opt/kafka/plugins
# Provided configuration
offset.storage.topic=connect-cluster-offsets
value.converter=org.apache.kafka.connect.json.JsonConverter
config.storage.topic=connect-cluster-configs
key.converter=org.apache.kafka.connect.json.JsonConverter
group.id=data-connect-cluster
status.storage.topic=connect-cluster-status
config.storage.replication.factor=3
offset.storage.replication.factor=3
status.storage.replication.factor=3

security.protocol=SASL_PLAINTEXT
producer.security.protocol=SASL_PLAINTEXT
consumer.security.protocol=SASL_PLAINTEXT
admin.security.protocol=SASL_PLAINTEXT

sasl.mechanism=PLAIN
sasl.jaas.config=[hidden]

producer.sasl.mechanism=PLAIN
producer.sasl.jaas.config=[hidden]

consumer.sasl.mechanism=PLAIN
consumer.sasl.jaas.config=[hidden]

admin.sasl.mechanism=PLAIN
admin.sasl.jaas.config=[hidden]

# Additional configuration
client.rack=

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:513)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:525)
Caused by: java.io.FileNotFoundException: /opt/kafka/custom-config/metrics-config.yml (No such file or directory)
        at java.base/java.io.FileInputStream.open0(Native Method)
        at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
        at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
        at java.base/java.io.FileReader.<init>(FileReader.java:75)
        at io.prometheus.jmx.shaded.io.prometheus.jmx.JmxCollector.<init>(JmxCollector.java:78)
        at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:29)
        ... 6 more
*** java.lang.instrument ASSERTION FAILED ***: "result" with message agent load/premain call failed at ./src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422
FATAL ERROR in native method: processing of -javaagent failed, processJavaStart failed
scholzj commented 2 years ago

Hmm, I thought it will show a bit more. What is the container image you use for the pod?

scholzj commented 2 years ago

Because Strimzi 0.30 does nto use metrics-config.yaml at all It uses metric-config.json. So I suspect your image is based on some much older Strimzi version and not on 0.30.0.

thiagorizzo commented 2 years ago

Its my env variables for operator:

STRIMZI_DEFAULT_CRUISE_CONTROL_IMAGE:quay.io/strimzi/kafka:0.30.0-kafka-3.2.0
STRIMZI_DEFAULT_JMXTRANS_IMAGE:quay.io/strimzi/jmxtrans:0.30.0
STRIMZI_DEFAULT_KAFKA_BRIDGE_IMAGE:quay.io/strimzi/kafka-bridge:0.21.6
STRIMZI_DEFAULT_KAFKA_EXPORTER_IMAGE:quay.io/strimzi/kafka:0.30.0-kafka-3.2.0
STRIMZI_DEFAULT_KAFKA_INIT_IMAGE:quay.io/strimzi/operator:0.30.0
STRIMZI_DEFAULT_KANIKO_EXECUTOR_IMAGE:quay.io/strimzi/kaniko-executor:0.30.0
STRIMZI_DEFAULT_MAVEN_BUILDER:quay.io/strimzi/maven-builder:0.30.0
STRIMZI_DEFAULT_TLS_SIDECAR_ENTITY_OPERATOR_IMAGE:quay.io/strimzi/kafka:0.30.0-kafka-3.2.0
STRIMZI_DEFAULT_TOPIC_OPERATOR_IMAGE:quay.io/strimzi/operator:0.30.0
STRIMZI_DEFAULT_USER_OPERATOR_IMAGE:quay.io/strimzi/operator:0.30.0
STRIMZI_FEATURE_GATES:
STRIMZI_FULL_RECONCILIATION_INTERVAL_MS:120000
STRIMZI_KAFKA_CONNECT_IMAGES:3.1.0=quay.io/strimzi/kafka:0.30.0-kafka-3.1.0 3.1.1=quay.io/strimzi/kafka:0.30.0-kafka-3.1.1 3.2.0=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0
STRIMZI_KAFKA_IMAGES:3.1.0=quay.io/strimzi/kafka:0.30.0-kafka-3.1.0 3.1.1=quay.io/strimzi/kafka:0.30.0-kafka-3.1.1 3.2.0=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0
STRIMZI_KAFKA_MIRROR_MAKER_2_IMAGES:3.1.0=quay.io/strimzi/kafka:0.30.0-kafka-3.1.0 3.1.1=quay.io/strimzi/kafka:0.30.0-kafka-3.1.1 3.2.0=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0
STRIMZI_KAFKA_MIRROR_MAKER_IMAGES:3.1.0=quay.io/strimzi/kafka:0.30.0-kafka-3.1.0 3.1.1=quay.io/strimzi/kafka:0.30.0-kafka-3.1.1 3.2.0=quay.io/strimzi/kafka:0.30.0-kafka-3.2.0
STRIMZI_NAMESPACE:*
STRIMZI_OPERATION_TIMEOUT_MS:300000
STRIMZI_OPERATOR_NAMESPACE:fieldRef(v1:metadata.namespace)
scholzj commented 2 years ago

Well, I don't think that is the case. Because if you use the settings above and the KafkaConnect CR you shared above which does nto have any .spec.image setting, then it will not give you the same error, because it does not use the YAML file:

$ docker run -ti quay.io/strimzi/kafka:0.30.0-kafka-3.2.0 cat kafka_connect_run.sh | grep metrics-config
    KAFKA_OPTS="${KAFKA_OPTS} -javaagent:$(ls "$KAFKA_HOME"/libs/jmx_prometheus_javaagent*.jar)=9404:$KAFKA_HOME/custom-config/metrics-config.json"

As I said above, it is using custom-config/metrics-config.json and not custom-config/metrics-config.yml whcih is used in your error message. So you have to be using another container image.

thiagorizzo commented 2 years ago

@scholzj you are right, its "strimzi/kafka:0.20.1-kafka-2.6.0"

scholzj commented 2 years ago

Right, so you need to update to the image for correct Strimzi and Kafka version and you should be fine.

scholzj commented 2 years ago

Kafka has good backwards compatibility. So I think Connect from Kafka 3.2.0 should work fine with Kafka 2.4.1. But if you want to stick with Kafka 2.4.1 for Connect, you should not use Strimzi 0.30 which supports only Kafka 3.2 and 3.2. You need to use some older Strimzi version which supports such an old Kfka version.

thiagorizzo commented 2 years ago

@scholzj great! gonna try it, thank you very much :D

thiagorizzo commented 2 years ago

@scholzj it worked, just updated the images. thanks

scholzj commented 2 years ago

Great. So I guess we can close this?

thiagorizzo commented 2 years ago

Sure, thanks again @scholzj