redpanda-data / observability

Apache License 2.0
38 stars 8 forks source link

Kafka handler latency metrics #26

Closed jrkinley closed 1 year ago

jrkinley commented 1 year ago

Updated the producer and consumer latency graphs to use the recommended Kafka handler metrics, which are available as of v23.1.19+ and v23.2.12+:

redpanda_kafka_handler_latency_seconds_bucket{handler="produce"}
redpanda_kafka_handler_latency_seconds_bucket{handler="fetch"}

The internal RPC latency graph has also been updated to include internal metrics only, omitting the kafka latency metrics as they are captured by the above:

redpanda_rpc_request_latency_seconds_bucket{redpanda_server="internal"}
jrkinley commented 1 year ago

redpanda_kafka_handler_latency_seconds

The latency measured from receiving a request up to sending the response. The handler label is available to differentiate between the different handlers:

handler="produce" (for producer latency) handler="fetch" (for consumer latency)

For consume latency one thing to keep in mind is how fetch requests work. The client specifies a fetch.wait.max parameter on the request. If no data is available the request will sit there and wait for data to arrive. This can lead to high latencies in this metric. A common value here is 500ms which is the default in librdkafka. Usually seen in scenarios with low per partition throughput and many partitions/clients.

Colloquially referred to as “per-handler metrics”.

Available as of 23.2+, 23.1.18+.