rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.52k stars 229 forks source link

[v0.10] Backport of Updating Ports with correctDrift enabled using multiple-paths repo triggers an error #2834

Open weyfonk opened 1 month ago

weyfonk commented 1 month ago

This is a backport of #2609 to v0.10.

weyfonk commented 1 month ago

(copied from #2609)

Additional QA

Problem

When failing to correct drift on a resource (eg. modified ports array on a service), Fleet would leave a GitRepo in Modified state, with no error on the corresponding bundle deployment status.

Solution

Testing

See reproduction steps above, in the issue description.

Engineering Testing

Manual Testing

  1. Created a GitRepo with drift correction enabled (but not set to force mode) pointing to rancher/fleet-test-data's multiple-paths
  2. Edited the created service ports
  3. Checked status of the GitRepo and bundle deployment
  4. Updated the GitRepo drift correction mode to true
  5. Saw the GitRepo and bundle deployment status error disappear, once the service had been recreated.

Automated Testing

QA Testing Considerations

Regressions Considerations

N/A

mmartin24 commented 1 month ago

I am still observing this in Rancher v2.9-2d10b66bb2e1e5fc7568591ba41648002cf29b20-head with fleet:104.1.0+up0.10.4-rc.1 following reproduction steps on original ticket.

@weyfonk, am I perhaps not looking at something well or is it still this fix not being propagated into the above-mentioned fleet version?

image

Fleet reconciliation error log ```json { "level": "error", "ts": "2024-10-04T14:28:04Z", "logger": "bundledeployment", "msg": "Failed to deploy bundle", "controller": "bundledeployment", "controllerGroup": "fleet.cattle.io", "controllerKind": "BundleDeployment", "BundleDeployment": { "name": "ds-cluster-correct-79-multiple-paths-service", "namespace": "cluster-fleet-default-imported-0-f1158df57e01" }, "namespace": "cluster-fleet-default-imported-0-f1158df57e01", "name": "ds-cluster-correct-79-multiple-paths-service", "reconcileID": "0b51e68c-b581-44cb-8a9d-4a00bc09c8ae", "status": { "conditions": [ { "type": "Installed", "status": "True", "lastUpdateTime": "2024-10-04T14:09:56Z" }, { "type": "Deployed", "status": "False", "lastUpdateTime": "2024-10-04T14:11:18Z", "reason": "Error", "message": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"" }, { "type": "Ready", "status": "True", "lastUpdateTime": "2024-10-04T14:10:04Z" }, { "type": "Monitored", "status": "True", "lastUpdateTime": "2024-10-04T14:09:56Z" } ], "appliedDeploymentID": "s-e900fb60b86d8593e95a733a0c0d1794f2d71a00910f794d19bcd4d57deca:aa73273923fd2b194b95dc51be330a7b1be92dafa689e0afb400abda8b37d8c0", "release": "test-fleet-mp-service/ds-cluster-correct-79-multiple-paths-service:1", "ready": true, "nonModified": true, "display": { "deployed": "Error: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"", "monitored": "True", "state": "ErrApplied" }, "syncGeneration": 0 }, "error": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"", "errorVerbose": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"\nhelm.sh/helm/v3/pkg/kube.(*Client).Update\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/v3@v3.15.2-fleet.0/pkg/kube/client.go:438\nhelm.sh/helm/v3/pkg/action.(*Install).performInstall\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/v3@v3.15.2-fleet.0/pkg/action/install.go:456\nhelm.sh/helm/v3/pkg/action.(*Install).performInstallCtx.func1\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/v3@v3.15.2-fleet.0/pkg/action/install.go:421\nruntime.goexit\n\t/home/runner/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.7.linux-amd64/src/runtime/asm_amd64.s:1695", "stacktrace": "github.com/rancher/fleet/internal/cmd/agent/controller.(*BundleDeploymentReconciler).Reconcile\n\t/home/runner/work/fleet/fleet/internal/cmd/agent/controller/bundledeployment_controller.go:131\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222" } { "level": "error", "ts": "2024-10-04T14:28:04Z", "msg": "Reconciler error", "controller": "bundledeployment", "controllerGroup": "fleet.cattle.io", "controllerKind": "BundleDeployment", "BundleDeployment": { "name": "ds-cluster-correct-79-multiple-paths-service", "namespace": "cluster-fleet-default-imported-0-f1158df57e01" }, "namespace": "cluster-fleet-default-imported-0-f1158df57e01", "name": "ds-cluster-correct-79-multiple-paths-service", "reconcileID": "0b51e68c-b581-44cb-8a9d-4a00bc09c8ae", "error": "failed deploying bundle: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"", "errorCauses": [ { "error": "failed deploying bundle: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"" } ], "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222" } ```
weyfonk commented 4 weeks ago

Thanks @mmartin24 for raising this.

I am still able to reproduce issues with updating service ports on Rancher v2.9.3-alpha4 with Fleet v0.10.4-rc.1. After a few seconds, although the bundle deployment containing the service appears as modified, the corresponding bundle sees its status updated to Ready, as if the bundle deployment were ready too. This in turn is reflected in the GitRepo owning that bundle. This happens because the resources fields are cleared from the bundle deployment's status. Why that happens is still unclear, although this code is a prime suspect.

weyfonk commented 1 week ago

Confirmed: this line calls a DryRun on a Wrangler apply.Apply, which returns an empty set of objects. In turn, that set is used to populate resources in the bundle deployment status, which explains why those resources don't appear in the status from that point onwards.

Fixing this would require either:

mmartin24 commented 5 days ago

Tested in v2.9-1d1065cd5bf09c23834720420e1153712fa43439-head with fleet:104.1.1+up0.10.5-rc.2 and still errorring. Same issue as in https://github.com/rancher/fleet/issues/2609#issuecomment-2454409377.

Setting back to backlog