redpanda-data / observability

Apache License 2.0
37 stars 8 forks source link

producer/consumer metrics using redpanda_kafka_request_bytes_total are wrong #40

Open hcoyote opened 1 month ago

hcoyote commented 1 month ago

This metric incorrectly calculates the usage when a learner event is happening (decom, node add, etc). we should be using these instead for determining on-the-wire traffic for the cluster for produce/consume side throughput.

The metric should update to:

Producer traffic:

sum(rate(redpanda_rpc_received_bytes{redpanda_server="kafka", redpanda_id="$redpanda_id"}[5m])) by (cluster)

Consumer traffic:

sum(rate(redpanda_rpc_sent_bytes{redpanda_server="kafka", redpanda_id="$redpanda_id"}[5m])) by (cluster)

Adjust the labels accordingly to fit the observability repo dashboards.

bpraseed commented 1 month ago

@hcoyote - does this fix sharechat issue of them seeing replication traffic on the consumer side ?

pmw-rp commented 2 weeks ago

Neither redpanda_rpc_received_bytes nor redpanda_rpc_sent_bytes includes topic-level detail. In contrast, redpanda_kafka_request_bytes_total does provide topic-level detail (using the redpanda_topic label).

Whether or not that matters is down to the use case. In this example, I wouldn't say we can use one in place of the other.