rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.52k stars 229 forks source link

Fix flaky drift correction tests #2858

Open weyfonk opened 2 months ago

weyfonk commented 2 months ago

Drift correction end-to-end tests are notoriously flaky. They currently get in the way of smoothly running CI workflows.

Here is an example of such failures, which mostly involve this test case.

weyfonk commented 2 months ago

First hypothesis: the Fleet agent's garbage collection, acting on Helm releases missing a bundle deployment, might interfere with drift correction which installs releases, upgrades them and rolls them back. This has been tested through #2861, by:

  1. Deploying Fleet with a longer garbage collection interval, which was then expected to reduce the likelihood of forced drift correction failing
  2. Disabling garbage collection altogether

In both cases, drift correction would still fail on the 1st or 2nd run (unlike locally, where it always passes), suggesting that garbage collection is not the culprit here.