kubernetes-sigs / controller-runtime

Repo for the controller-runtime subproject of kubebuilder (sig-apimachinery)
Apache License 2.0
2.43k stars 1.12k forks source link

[Question] Integration with tracing #1876

Closed STRRL closed 1 week ago

STRRL commented 2 years ago

Hi! I found that the feature "API Server Tracing" is available as alpha since Kubernetes v1.22, and this blog mentioned that a simple patch could also enable tracing on controller-runtime as well.

I think integration with tracing would be a powerful tool to enhance the observability of controller-runtime and soo-many operators build on it.

Is this feature on the roadmap? And I am very interested in building that.

STRRL commented 2 years ago

related PR: https://github.com/kubernetes-sigs/controller-runtime/pull/1211

FillZpp commented 2 years ago

Thanks @STRRL . I'm not sure, the patch example in the blog adds an otelhttp handler on top of the existing webhook server. Is that all we have to do?

STRRL commented 2 years ago

Is that all we have to do?

No.

IMO, including the webhook server, there are also other components that need integration with tracing, like

The first 3 things might be resolved by otelhttp, and propagate context properly. And the 4th one might need upstream updates on logr. But we would still implement customized implementation for logr.LogSink as a preview.

FillZpp commented 2 years ago

I do understand tracing webhook server may help users to find out the time cost of a request to apiserver. But I don't understand what should we trace for controller or reconciler, for they all work asynchronously. Are you going to trace each object from list/watch to reconcile?

STRRL commented 2 years ago

For almost all the controllers/operators based on controller-runtime, the Reconciler is the most important part which contains their core business logic. I think there is no reason to ignore the tracing on them.

But I don't understand what should we trace for controller or reconciler, for they all work asynchronously. Are you going to trace each object from list/watch to reconcile?

I did not think about how tracing context/span propagates throw api-server and etcd, it might work or might not. And I am not sure that "trying to find out one reconciliation relates to previous reconciliation" is practical or not in theory, because the current status is the aggregation of all the previous updates, there must be overlapping for the propagation of different tracing contexts/span. I think it should be cleared when we actually design the tracing integration.

On the other side, only tracing operations inside only one reconciliation is also very useful:

Are you going to trace each object from list/watch to reconcile?

Based on the former topic, I want to trace all the single reconciliation, I am not sure, but prefer to yes.

STRRL commented 2 years ago

I was suffering with profiling the performance of the chaos mesh controller-manager recently days. It makes me concentrate much more on the tracing of Kubernetes operators.

I will start to work on this issue in the next several weeks.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/controller-runtime/issues/1876#issuecomment-1357004490): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
mjnovice commented 5 months ago

Can this be re-opened ?

sbueringer commented 5 months ago

/reopen /remove-lifecycle rotten

k8s-ci-robot commented 5 months ago

@sbueringer: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/controller-runtime/issues/1876#issuecomment-2043413810): >/reopen >/remove-lifecycle rotten Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 week ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/controller-runtime/issues/1876#issuecomment-2332534456): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.