kyma-project / telemetry-manager

Manager for the Kyma telemetry module
https://kyma-project.io/#/telemetry-manager/user/README
Apache License 2.0
5 stars 24 forks source link

Reconciliation takes too long to execute #1097

Open skhalash opened 5 months ago

skhalash commented 5 months ago

Description

Fixing the managed Kyma dashboards exposed an issue with the CR reconciliation duration across all three pipeline types and the Telemetry CR. The median reconciliation duration for the pipelines is approximately 1 second, with the 99th percentile reaching around 4 seconds for long-running pipelines that were deployed months ago. Ideally, after an initial deployment each reconciliation should be a no-op since there have been no changes. The Telemetry CR fares slightly better, but its reconciliation duration is still within the same order of magnitude.

What can cause the problem?

  1. Client cache configuration contains a list of concrete GVKs to be cached. However, this list has not been maintained for a while. That's why it does not contain all GVKs deployed by different operator controllers (e.g. Fleunt Bit, OTel Collector and Self-Monior resources). We could instead use the DefaultNamespace cache option and automatically cache everything in the kyma-system namespace.
  2. There is a hypothesis that CreateOrUpdate utils have never actually worked and always perform an API call instead of checking a diff and returning early.

Expected result

A no-op reconciliation should not take that long

Actual result

A no-op reconciliation takes seconds

Steps to reproduce

Troubleshooting

Release Notes

skhalash commented 5 months ago

Here people stumble upon the same problem with comparing resources https://github.com/kubernetes-sigs/kubebuilder/issues/592

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 months ago

This issue has been automatically closed due to the lack of recent activity. /lifecycle rotten

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 6 days ago

This issue has been automatically closed due to the lack of recent activity. /lifecycle rotten