nmstate / kubernetes-nmstate

Declarative node network configuration driven through Kubernetes API.
GNU General Public License v2.0
172 stars 86 forks source link

handler: Expose nmstatectl stats as k8s metrics #1221

Closed qinqon closed 3 months ago

qinqon commented 7 months ago

Is this a BUG FIX or a FEATURE ?: /kind enhancement

What this PR does / why we need it: Now that nmstatectl is able to calculate some useful stats from network configuration [1], we can bubble them up and expose them as k8s metrics so k-nmstate users can digg on them using prometheus, graphana or the like.

This change add a new "Features" under nnce Status with the output of nmstatectl st and also create a new deployment nmstate-metrics that will gather the NNCEs features and reflecta that at a cluster wide gaugue prometheus metric.

This is an example of nmstate feature stat

kubernetes_nmstate_features_applied{name="dhcpv4-custom-hostname"} 1

Depends on nmstate 2.2.20, looks like it's build but still not present at centos 9 stream

[1] https://github.com/nmstate/nmstate/pull/2420

TODO:

Release note:

Expoxe statistics generated from `nmstatectl stats`
sradco commented 7 months ago

@machadovilaca @avlitman please see if you can help in reviewing this PR.

qinqon commented 7 months ago

@avlitman @machadovilaca @sradco can you take another look ?

kubevirt-bot commented 7 months ago

@machadovilaca: changing LGTM is restricted to collaborators

In response to [this](https://github.com/nmstate/kubernetes-nmstate/pull/1221#pullrequestreview-1793008131): > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sradco commented 7 months ago

@qinqon I propose you to also add docs generator for the operator. I think this is really useful.

We have the automation, so that when the user adds a PR with a new metric, the test runs and checks if the metric is already documented. If not the user is asked to run make generate and this automatically updated the PR with the change to the metrics.md file with the new metric, description and type.

See an example to the metrics.md file here https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/metrics.md and the docs generator is here https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/tools/metricsdocs/metricsdocs.go (Note: we plan to move it to /monitoring/tools/ )

qinqon commented 6 months ago

@qinqon I propose you to also add docs generator for the operator. I think this is really useful.

We have the automation, so that when the user adds a PR with a new metric, the test runs and checks if the metric is already documented. If not the user is asked to run make generate and this automatically updated the PR with the change to the metrics.md file with the new metric, description and type.

See an example to the metrics.md file here https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/metrics.md and the docs generator is here https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/tools/metricsdocs/metricsdocs.go (Note: we plan to move it to /monitoring/tools/ )

@sradco introducing it, add a lot of golang dependencies to the project, I am not sure about it, maybe we can do this at follow up.

qinqon commented 4 months ago

@sradco @machadovilaca I have convert the Counter to Guague and decrease if topology/feature is no longer in use, can you take another look to see if everything is ok from a monitoring perspective ?

qinqon commented 3 months ago

We are going to steak with features for now since it has a limited bounds, we will investigate options for topology

sradco commented 3 months ago

@qinqon Can you please add an example to the PR description of the end metric with labels?

qinqon commented 3 months ago

/retest

qinqon commented 3 months ago

/retest

Trying to pull registry.access.redhat.com/ubi9/ubi-minimal:latest...
Error: creating build container: copying system image from manifest list: determining manifest MIME type for docker://registry.access.redhat.com/ubi9/ubi-minimal:latest: reading manifest sha256:119ac25920c8bb50c8b5fd75dcbca369bf7d1f702b82f3d39663307890f0bf26 in registry.access.redhat.com/ubi9/ubi-minimal: received unexpected HTTP status: 502 Bad Gateway
make[1]: *** [Makefile:168: push-operator] Error 125
make[1]: Lea
qinqon commented 3 months ago

@sradco can you take another look ? I think I have cover all the comments.

qinqon commented 3 months ago

/hold

nmstate from "base" branch is failing a metrics tests at the "future" lane.

qinqon commented 3 months ago

/retest

locally with NMSTATE_PIN=future everything is fine.

qinqon commented 3 months ago

/retest

qinqon commented 3 months ago

/hold cancel

Now is ready

qinqon commented 3 months ago

/retest

mkowalski commented 3 months ago

/lgtm

qinqon commented 3 months ago

/approve

qinqon commented 3 months ago

/retest

kubevirt-bot commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qinqon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/nmstate/kubernetes-nmstate/blob/main/OWNERS)~~ [qinqon] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
qinqon commented 3 months ago

/retest