kcp-dev / contrib-tmc

An experimental add-on readding some Kubernetes compute APIs and impement transparent multi-cluster scheduling
Apache License 2.0
5 stars 3 forks source link

feature: Adding metrics to syncer #141

Open vishnuchalla opened 1 year ago

vishnuchalla commented 1 year ago

Feature Description

As a follow up to this request: https://github.com/kcp-dev/kcp/issues/2781. I am interested in picking up the task of adding metrics to syncer.

Proposed Solution

As a conclusion on discussion with @davidfestal, exploring on options similar to controllers runtime to integrate with syncer to publish metrics. Below are the some of the resources to explore.

Alternative Solutions

No response

Want to contribute?

Additional Context

No response

ncdc commented 1 year ago

Do you have specific metrics you are thinking of adding?

vishnuchalla commented 1 year ago

I am not sure at this moment. But I am looking for options to add metrics that are similar to the ones exposed by \metrics endpoint and primarily focusing on latencies for syncer.

mjudeikis commented 1 year ago

Triage: Proposed 1-2 metrics for this before writing any code :)

MikeSpreitzer commented 1 year ago

I have three metrics about the syncer that I would like to see.

  1. I would like to have a metric that we can look at to observe throughput. I assume that in each direction the syncer transports desired/reported state in chunks that each contain all it has to read/write about a single object. If so then I think it would be good to have a histogram vector of observations of chunk size. The vector would be indexed by APIGroup, APIVersion, and Resource or Kind (or, better yet, both!). The rate of change of the counts gives us message throughput, the rate of change of sum gives us data throughput, the sum gives us cumulative data volume transferred, and the buckets give us distribution of sizes in each vector element.
  2. I would also like to be able to observe trouble that the syncer has with requests to the apiserver to create/update/delete objects. A request can fail in various ways, including both (a) getting an HTTP response code that reports a problem and (b) suffering a disconnection that leaves the client unsure what happened at the server.
  3. I would also like to be able to observe latency. Recall the (usually hidden by kubectl get -o yaml) managedFields that reports on the latest write to each section of the object. How long from write at origin server to successful corresponding write to the other copy by the syncer?
MikeSpreitzer commented 1 year ago

For latencies, remember that https://github.com/kubernetes/kubernetes/pull/110058 is not yet in kcp's fork of Kubernetes.

vishnuchalla commented 1 year ago

@davidfestal, @s-urbaniak, @MikeSpreitzer and @csams - I am fairly new to the syncer and metrics related code. I have been crawling over the code base and couldn't make much sense of the flow on how metrics are being published. I have looked at one of the previous code and took a look at the prometheus metrics package which is being used in some parts of our current kcp code. But still not sure on where to start with and get things into action.

Can you please suggest me on some resources or a plan of action items to start with (just to get a good hang of the codebase), so that they can help me get a better understanding on how to go about adding metrics to syncer and to test/verify if they are actually getting published?

Thanks in Advance, Vishnu Challa

s-urbaniak commented 1 year ago

@vishnuchalla i don't know the internal details of the syncer but a good start wrt Kubernetes is https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md, and generally Prometheus best practices https://prometheus.io/docs/practices/instrumentation/.

Generally, I would:

  1. Ensure the syncer has a /metrics endpoint available, your referenced front-proxy PR is a good starting point, although I would not encourage to use the legacy registry but instead, as outlined in https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1206-metrics-overhaul use dependency injection of the prometheus registry.
  2. Add new metrics to the syncer
vishnuchalla commented 1 year ago

@vishnuchalla i don't know the internal details of the syncer but a good start wrt Kubernetes is https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md, and generally Prometheus best practices https://prometheus.io/docs/practices/instrumentation/.

Generally, I would:

  1. Ensure the syncer has a /metrics endpoint available, your referenced front-proxy PR is a good starting point, although I would not encourage to use the legacy registry but instead, as outlined in https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1206-metrics-overhaul use dependency injection of the prometheus registry.
  2. Add new metrics to the syncer

Thanks for the suggestions. Will take a look.

embik commented 1 year ago

/transfer-issue contrib-tmc