hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.25k stars 4.41k forks source link

Topology metrics view has graph window that's dynamic but metrics along bottom are always averaged over 15m #10662

Open lkysow opened 3 years ago

lkysow commented 3 years ago

Old UI or New UI New UI (Consul 1.10.1)

Describe the problem you're having image

The topology UI has a window of ~5m~ (edit: after further investigation–see below–it's actually dynamic) for the graph but the metrics along the bottom are averaged over 15m. This can be somewhat confusing. In my screenshot above, I've had a solid 1 RPS for the last 5 minutes but the RPS metric on the bottom is 0.29 RPS since it's only been 1 RPS for the last 5m, not the full 15m.

Describe the solution you'd like I think it would make sense for the time ranges to match. I'm not sure which is best between 5m or 15m, I think it more just matters that it's consistent.

kaxcode commented 3 years ago

@lkysow The graph lines and the stats are from the last 15 mins. Can you share where you are getting 5 minutes from?

cc: @banks

lkysow commented 3 years ago

Hmm, maybe what's happening is that the window is dynamic based on when the metrics started to be available? Here you can see on a newly installed cluster that the left side starts at 11:21:06 and goes to 11:21:36

CleanShot 2021-07-26 at 11 21 42@2x CleanShot 2021-07-26 at 11 21 35@2x
lkysow commented 3 years ago

And then it probably maxes out at 15m once 15minutes of information are available.

lkysow commented 3 years ago

I've confirmed that's what's happening! After waiting 15m, the window size maxes out at 15m.

banks commented 3 years ago

Ah this could just be a display issue where we are assuming Prometheus will return a datapoint (even if it's zero) for every increment in the requested time window and plotting what we get blindly, but actually in this case Prometheus is only returning non-zero samples, or is only returning zero samples it actually recorded and not ones for timestamps in the range where no sample existed at all.

@kaxcode we can possibly still fix that in the UI component that draws the graph if we know wha the window should be at that point. Otherwise we may need to document that providers are expected to return a data point for every increment in the interval requested even if it's zero, and then fix the Prometheus one to do that always?