strimzi / strimzi-kafka-bridge

An HTTP bridge for Apache Kafka®
Apache License 2.0
280 stars 119 forks source link

Latency alerts #916

Open mkl262 opened 3 months ago

mkl262 commented 3 months ago

Hello,

I use Bridge only as a producer in my system, And connect it to AWS MSK as a backend. I am occasionally getting alerts for high consumer fetch latency (based on strimzi_bridge_kafka_producer_request_latency_avg metric). What is the meaning of this alert? how can I resolve it?

bridge config:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaBridge
metadata:
  name: bridge
  namespace: kafka
spec:
  replicas: 2
  bootstrapServers: "b-1.<clusterID>.kafka.<region>.amazonaws.com:9092,b-2.<clusterID>.kafka.<region>.amazonaws.com:9092,b-3.<clusterID>.kafka.<region>.amazonaws.com:9092"
  http:
    port: 8080
  enableMetrics: true
  template:
    pod:
      metadata:
        annotations:
          prometheus.io/scrape: 'true'
          prometheus.io/port: '8080'

Grafana screenshot:

Screenshot 2024-07-06 at 0 38 07
ppatierno commented 2 months ago

As the metric name says, it's the latency on communication between producer and broker. I guess something maybe related to your network.

mkl262 commented 2 months ago

As I wrote, I run Strimzi Bridge in an EKS cluster, and connect it to an MSK cluster, so its hard for me to believe that there is a network latency issue, as I would have noticed it in my other services. Are there any tests that can help me narrow down the cause of this?

scholzj commented 2 months ago

As I wrote, I run Strimzi Bridge in an EKS cluster, and connect it to an MSK cluster, so its hard for me to believe that there is a network latency issue, as I would have noticed it in my other services. Are there any tests that can help me narrow down the cause of this?

To be honest, I'm not sure I follow the point of this issue. At the beginning, you suggested that you have some latency issues. And now you seem to be insisting you do not have any latency issues because you run on AWS. Leaving aside the logic that using AWS means no latency issues which I think is highly questionable, if you do not have any latency issues then everything is great, or? Just adjust the alert for what you would consider "too high" latency or remove it completely.

ppatierno commented 5 days ago

@mkl262 is there anything else you want to add to this issue or can we close it?