Closed rancherbot closed 1 year ago
To check below points.
I used the script given by @manno to create 100+ clusterregistrations in the cluster.
In order to reproduce the issue following steps were performed.
Rancher 2.7.5
to Rancher 2.7.6-rc2
.Observations
Rancher: v2.7.5
Fleet: v0.7.0
GitRepo
in the cluster.clusterregistrations
.clusterregistrations
increased from 4 to 414.clusterregistrations
.clusters.fleet.cattle.io
kubectl delete clusters.fleet.cattle.io -n fleet-default imported-cluster-2
After observing this sitution over the days, I upgraded to the latest Rancher RC version and fleet RC version in which the fix is available.
After Upgrade
Rancher: v2.7.6-rc2
Fleet: 0.7.1-rc.2
clusterregistrations
went down to 4.clusters.fleet.cattle.io
got re-added to fleet.GitRepo
while upgrading it to the Rancher 2.7.6-rc2
.clusterspecs
are working as expected.clusterSpec
update, I started fleet-controller
, I see fleet-agent
is re-created on imported clusters with the updated spec
configurations.clusterregistrations
are get deleted or not.
kubectl patch clusters.fleet.cattle.io -n fleet-local local --type=json -p '[{"op": "add", "path": "/spec/redeployAgentGeneration", "value": 2}]'
clusterregistrations
and pointing to the new one is shown.P.S. In above testing, P0 and regression tests performed on the cluster after upgrade.
Resources | Before upgrade | After Upgrade |
---|---|---|
ClusterRoleBindings | 512 | 110 |
Cluster Roles | 136 | 143 |
RoleBindings | 905 | 94 |
Roles | 482 | 80 |
This is a backport issue for https://github.com/rancher/fleet/issues/1690, automatically created via rancherbot by @manno
Original issue description:
This is an extension to #1651. Should also fix https://github.com/rancher/fleet/issues/1674 It needs a backport to 0.7.x.
Implemented by: https://github.com/rancher/fleet/pull/1689
Fleet 0.7.0 creates multiple clusterregistration resources and does not clean them up. This adds a helm hook to run a a clean up script when upgrading Fleet.
We assume agents are only using the latest clusterregistration and clean up the others. The script does not check if a registration was granted. It does try to delete the child resources, too. If the fleet-controller is running, its clean up handler would also delete the orphaned resources. The script works over all namespaces.
The migration job can be disabled via helm values.
Testing
Engineering Testing
Manual Testing
Upgraded fleet standalone multiple times and watched the job spawn. Checked with
helm template
if the new value work.QA Testing Considerations
The clean up script might use a lot of resources and run for a long time if cleaning up lots of (20k+) resources. It should be fine for smaller fleets (<20 clusters).
Regressions Considerations
Some fleets might have too many resources for an automatic clean up to be effective?