rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.51k stars 227 forks source link

Drift correction not working #2436

Closed lindhe closed 1 week ago

lindhe commented 5 months ago

Is there an existing issue for this?

Current Behavior

If an object is changed, Fleet detects the diff but does nothing to converge to a healthy state.

Expected Behavior

When spec.correctDrift.enabled=true, I expect Fleet to try and apply changes as soon as there is a diff.

Steps To Reproduce

  1. Have Rancher v2.8.1 installed.
  2. In Rancher, click "Continuous Delivery" and "Git Repos" and select the "fleet-local" workspace.
  3. Add a GitRepo that applies some resource. Make sure to check "Enable Self-Healing" to set spec.correctDrift.enabled=true in the bundle.
  4. Wait for the GitRepo to sync and become healthy, with the new resource created and in state "Ready".
  5. Edit the resource using kubectl edit, e.g. delete a label or something.
  6. Observe new state "Modified" for the resource:

    Screenshot 2024-05-16 171752

Environment

- Architecture: amd64
- Fleet Version: The one that's bundled with Rancher v2.8.1. 
- Cluster:
  - Provider: RKE2
  - Options: 3 nodes upstream cluster
  - Kubernetes Version: 1.27.9

Logs

No response

Anything else?

It looks like https://github.com/rancher/fleet/pull/1594 tried to implement drift correction, but it's clearly not working.

manno commented 4 months ago

Probably related to https://github.com/rancher/fleet/issues/2551

jhoblitt commented 2 months ago

I'm seeing the same behavior with rancher 2.8.5 / fleet 0.9.8.

image

jhoblitt commented 2 months ago

I've reproduced the problem with rancher 2.9.1 / 0.10.1 as well:

image

The fleet-agent logs on the cluster are the same messages repeated over and over again. E.g.:

{"level":"info","ts":"2024-09-04T22:15:40Z","logger":"bundledeployment.RemoveExternalChanges","msg":"Drift correction: rollback","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"c6f34409-8c39-4bfa-bb86-ff79d3028f46"}
{"level":"info","ts":"2024-09-04T22:15:41Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"ae91a7c9-9d4e-4ed4-988a-eeb61c5d015d","deploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d","appliedDeploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d","release":"rook-ceph/rook-ceph-conf:1033","appliedDeploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d"}
{"level":"info","ts":"2024-09-04T22:15:42Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"ae91a7c9-9d4e-4ed4-988a-eeb61c5d015d","error":"cephnfs.ceph.rook.io rook-ceph/auxtel modified {\"spec\":{\"server\":{\"resources\":{\"limits\":{\"cpu\":\"3\"}}}}}"}
{"level":"info","ts":"2024-09-04T22:15:42Z","logger":"bundledeployment.RemoveExternalChanges","msg":"Drift correction: rollback","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"ae91a7c9-9d4e-4ed4-988a-eeb61c5d015d"}
{"level":"info","ts":"2024-09-04T22:15:43Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-cluster","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-cluster","reconcileID":"c1678532-73f5-4929-9985-e50db5603133","deploymentID":"s-1c47fdd307de7cd53771b1bcf05e2d5bf014de495953153ac73876102439a:096621fc89c2b75c31148d0b150ad30027e7035383a85288a66c67d0980fb055","appliedDeploymentID":"s-1c47fdd307de7cd53771b1bcf05e2d5bf014de495953153ac73876102439a:096621fc89c2b75c31148d0b150ad30027e7035383a85288a66c67d0980fb055","release":"rook-ceph/rook-ceph-cluster:2","appliedDeploymentID":"s-1c47fdd307de7cd53771b1bcf05e2d5bf014de495953153ac73876102439a:096621fc89c2b75c31148d0b150ad30027e7035383a85288a66c67d0980fb055"}
{"level":"info","ts":"2024-09-04T22:15:43Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"1407fd06-6c0a-4e3a-98a8-baf4bff370da","deploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d","appliedDeploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d","release":"rook-ceph/rook-ceph-conf:1034","appliedDeploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d"}
{"level":"info","ts":"2024-09-04T22:15:44Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"1407fd06-6c0a-4e3a-98a8-baf4bff370da","error":"cephnfs.ceph.rook.io rook-ceph/auxtel modified {\"spec\":{\"server\":{\"resources\":{\"limits\":{\"cpu\":\"3\"}}}}}"}
{"level":"info","ts":"2024-09-04T22:15:44Z","logger":"bundledeployment.RemoveExternalChanges","msg":"Drift correction: rollback","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"1407fd06-6c0a-4e3a-98a8-baf4bff370da"}
{"level":"info","ts":"2024-09-04T22:15:45Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-cluster","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-cluster","reconcileID":"494aa1f3-8e09-4790-ac4a-7acc6bc34b7f","deploymentID":"s-1c47fdd307de7cd53771b1bcf05e2d5bf014de495953153ac73876102439a:096621fc89c2b75c31148d0b150ad30027e7035383a85288a66c67d0980fb055","appliedDeploymentID":"s-1c47fdd307de7cd53771b1bcf05e2d5bf014de495953153ac73876102439a:096621fc89c2b75c31148d0b150ad30027e7035383a85288a66c67d0980fb055","release":"rook-ceph/rook-ceph-cluster:2","appliedDeploymentID":"s-1c47fdd307de7cd53771b1bcf05e2d5bf014de495953153ac73876102439a:096621fc89c2b75c31148d0b150ad30027e7035383a85288a66c67d0980fb055"}
{"level":"info","ts":"2024-09-04T22:15:45Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"3f0b30ce-a868-48e0-95aa-1ce063f78d48","deploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d","appliedDeploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d","release":"rook-ceph/rook-ceph-conf:1035","appliedDeploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d"}
{"level":"info","ts":"2024-09-04T22:15:46Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"3f0b30ce-a868-48e0-95aa-1ce063f78d48","error":"cephnfs.ceph.rook.io rook-ceph/auxtel modified {\"spec\":{\"server\":{\"resources\":{\"limits\":{\"cpu\":\"3\"}}}}}"}
{"level":"info","ts":"2024-09-04T22:15:46Z","logger":"bundledeployment.RemoveExternalChanges","msg":"Drift correction: rollback","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"3f0b30ce-a868-48e0-95aa-1ce063f78d48"}
{"level":"info","ts":"2024-09-04T22:15:47Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"9cd5479d-4754-4c7b-909f-3acde128815b","deploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d","appliedDeploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d","release":"rook-ceph/rook-ceph-conf:1036","appliedDeploymentID":"s-f4e58f4e8d63737718ef1c935b3bcd8054daea4ce155c1be7f814b4738481:704080e856144689404f7488b30ba700c635fea680a58494dd909c76b226262d"}
{"level":"info","ts":"2024-09-04T22:15:48Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"9cd5479d-4754-4c7b-909f-3acde128815b","error":"cephnfs.ceph.rook.io rook-ceph/auxtel modified {\"spec\":{\"server\":{\"resources\":{\"limits\":{\"cpu\":\"3\"}}}}}"}
{"level":"info","ts":"2024-09-04T22:15:48Z","logger":"bundledeployment.RemoveExternalChanges","msg":"Drift correction: rollback","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb"},"namespace":"cluster-it-5628-ruka-plus-plus-ruka-67e02fc747cb","name":"ruka-fleet-s-dev-c-ruka-rook-ceph-conf","reconcileID":"9cd5479d-4754-4c7b-909f-3acde128815b"}
weyfonk commented 4 weeks ago

I have tried, and failed, to reproduce this against the current main by deleting a label on a config map. This needs further investigation. Could you share an example of a workload (GitRepo), or a known manifest or chart, which triggers this failure?

In any case, #2917 should reduce the noise compared to logs shared above.

manno commented 1 week ago

Cleaning up the backlog.