Is your feature request related to a problem? Please describe.
It's fairly straightforward to use existing dashboards and metrics to alert on resource usage once it exceeds some threshold. However sometimes it's also useful to see resource usage changes that wouldn't necessarily trigger an alert, but would give earlier warnings about unexpected increases/decreases, and allow a proactive approach to solving problems before they trigger alerts.
Tracking CPU/Memory/Disk/Network usage and comparing to situation-dependent time periods would improve cell observability and allow for a proactive rather than reactive approach to solving issues.
Describe the solution you'd like
I would like to add dashboard(s) to the Mimir dashboard mixin that provide some time based comparisons for resource usage. For example, I would like to see the CPU and Memory usage for a given Mimir component compared to 1 week prior. The dashboard would allow the user see these comparisons for CPU, Memory, disk usage (capacity), and Network. Some predefined time periods for comparison would be daily, weekly, day of week, monthly, but should allow for arbitrary time periods as well.
Describe alternatives you've considered
Alternatives are to create custom dashboards, recording rules, alerts, etc. However this is a common enough scenario that it would be beneficial to the full Mimir community, so adding such a dashboard to the mixin makes sense.
Additional context
Some typical use cases where this would be beneficial:
Component scaling
Store-gateway: As more series are ingested, the usage of store-gateway data disks for storing index headers may be ever increasing, if the rate of samples ingested exceeds the rate of blocks deleted because of retention policies. The same is true of the memory usage in the store-gateways. In this scenario, store-gateways will need to scale either vertically or horizontally as limits are approached. Having a dashboard to track the week over week or month over month increases can help with future capacity planning, well before components are at their limits.
Ingester: similar to the store-gateway, but with CPU usage as the critical resource.
Compactor: Increases in either CPU usage or disk usage that are beyond expectations but below limits could give an indication of when to plan scaling.
Anomaly detection: Looking at the week over week, month over month, etc. changes allows the user to establish a "baseline" expectation for their cluster. If they see resource usage increase or decrease out of bounds with this expectation, but not enough that it would trigger some pre-existing alert, it allows the user to respond to the issue (e.g. by scaling, troubleshooting) proactively.
I plan to spend some time working on this, so I'm creating this issue for visibility and tracking.
Is your feature request related to a problem? Please describe.
It's fairly straightforward to use existing dashboards and metrics to alert on resource usage once it exceeds some threshold. However sometimes it's also useful to see resource usage changes that wouldn't necessarily trigger an alert, but would give earlier warnings about unexpected increases/decreases, and allow a proactive approach to solving problems before they trigger alerts.
Tracking CPU/Memory/Disk/Network usage and comparing to situation-dependent time periods would improve cell observability and allow for a proactive rather than reactive approach to solving issues.
Describe the solution you'd like
I would like to add dashboard(s) to the Mimir dashboard mixin that provide some time based comparisons for resource usage. For example, I would like to see the CPU and Memory usage for a given Mimir component compared to 1 week prior. The dashboard would allow the user see these comparisons for CPU, Memory, disk usage (capacity), and Network. Some predefined time periods for comparison would be daily, weekly, day of week, monthly, but should allow for arbitrary time periods as well.
Describe alternatives you've considered
Alternatives are to create custom dashboards, recording rules, alerts, etc. However this is a common enough scenario that it would be beneficial to the full Mimir community, so adding such a dashboard to the mixin makes sense.
Additional context
Some typical use cases where this would be beneficial:
I plan to spend some time working on this, so I'm creating this issue for visibility and tracking.