redpanda-data / observability

Apache License 2.0
37 stars 8 forks source link

Use `deriv` instead of `rate` for `cpu_busy_seconds_total` #41

Closed daisukebe closed 1 month ago

daisukebe commented 1 month ago

Using rate causes an unexpected outcome. deriv works well.

ref. https://redpandadata.slack.com/archives/C03H26FHJQL/p1724904615666739

hcoyote commented 1 month ago

Using deriv is probably required here because cpu_busy_seconds is incorrectly defined as a gauge in RP core, and our last attempt to remedy that had unexpected impact for our datadog integration.

Functionally, the only time this metric goes down from what we've seen is during broker restarts.

Let's hold off approving this for a bit until I look at some things. My first pass through the original chunk says the units aren't right on this chart. Pretty sure I ran into this with another user.

daisukebe commented 1 month ago

I see. Thank you @hcoyote