Closed bill3tt closed 3 years ago
Suggested labels: good first issue
, component: mixins
As this requires more metrics than Thanos is actually exposing (I.e. node / kube-api metrics) - Would this not make sense to leave this at a totally different dashboard?
I personally have dashboards like this:
and
With a global view like:
These dashboards are (correct me if I'm wrong) also included in the Prometheus operator. If we would include these stats, you create a dependency on those metrics but also (I think) certain rules required for it.
Yes, these are included in the kubernetes-mixin and that is included in promethues-operator in a generalised fashion. This means that you have to choose the appropriate dashboard for these stats, then filter for the namespace in question to see these metrics for everything under that ns.
Some tool has similar metrics on their operational overview dashboard to make troubleshooting performance issues easier, Loki comes to mind, if I am right.
IMO these CPU & Memory limits are specific to Kubernetes env. The dashboards from Kubernetes mixin are sufficient for this use case and I feel it is not a good idea to add them to Thanos mixin.
I think mixin is fine, the only thing is that probably we should not reuse the same dashboards we have now.
Reason is that there is a difference between monitoring dashboards and deep down quick debugging dashboards which can have 100 graphs.
@wiardvanrij & @yeya24 you both raise an important point I hadn't considered. For the core Thanos project to remain environment neutral we can't introduce environment-specific dependencies into the mixin, we can only use metrics that are exported from the core project itself.
I still think there is scope to include relevant CPU & memory information from the go_*
family of metrics, but I'm not sure exactly what quite yet.
@bwplotka I think you meant to post that comment in https://github.com/thanos-io/thanos/issues/4401 :) (which I hadn't yet created during the meeting yesterday).
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Closing for now as promised, let us know if you need this to be reopened! 🤗
Is your proposal related to a problem?
Debugging out internal monitoring stack with @bwplotka using the Thanos mixin dashboards - we struggled to tell if compoments were exceeding their allowed threshold because these are not present in the dashboards by default.
(Write your answer here.)
Describe the solution you'd like
Add 'standard' expected metrics to the component dashboards:
(Describe your proposed solution here.)
Describe alternatives you've considered
None.
Additional context
There will almost certainly be some really good example of these out in the wild. Two starting points: