kubernetes-sigs / cluster-api-provider-nested

Cluster API Provider for Nested Clusters
Apache License 2.0
299 stars 65 forks source link

✨ Metrics for vn-agent #251

Closed srirammageswaran8 closed 2 years ago

srirammageswaran8 commented 2 years ago

User Story

As an operator I would like to export metrics for vn-agent.

Detailed Description

vn-agent does not export metrics at this point. A metrics server is to be enabled by default on the address :9100 and allowed to be configured by the cli option --metrics-addr options. The metrics server can be disabled by cli option --enable-metrics to false. The metrics are default enabled.

vn-agent should expose the below metrics

/kind feature

christopherhein commented 2 years ago

/retitle ✨ Metrics for vn-agent

christopherhein commented 2 years ago

/assign @srirammageswaran8

christopherhein commented 2 years ago

Thanks @srirammageswaran8, this makes sense to me it will be good to be able to understand the health of each vn-agent and the how they are performing.

cc @Fei-Guo & @charleszheng44 concerns or thoughts?

Fei-Guo commented 2 years ago

Since the vn-agent simply proxies the kublet APIs, I don't see strong need of having pprof enabled. It does make sense to add tenant related metrics in vn-agent.

srirammageswaran8 commented 2 years ago

Updated Description and @Fei-Guo agree with you on pprof.

atosatto commented 2 years ago

The metrics server can be disabled by cli option --enable-metrics to false. The metrics are default enabled.

I don't think this is necessary. Perhaps users that don't want to collect metrics can just bind localhost.

christopherhein commented 2 years ago

Issue with this is a traditional deployment the vnagent is run using host networking as a daemonset. So binding to localhost would still make that accessible to clustered workloads.

atosatto commented 2 years ago

Issue with this is a traditional deployment the vnagent is run using host networking as a daemonset. So binding to localhost would still make that accessible to clustered workloads.

LGTM then, thanks for clarifying this!