rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.47k stars 216 forks source link

Problems with Gitrepos with wrong URLs #2520

Open mmartin24 opened 2 weeks ago

mmartin24 commented 2 weeks ago

Issue:

Some logs and performance degradation seem to occur on Rancher v2.9-e0af6cccbff0210e1538004dfc67f10f40597c20-head with fleet:v0.10.0-rc.15 when creating a GitRepo with spaces in URL and trying to fix. There are more then one issue here, but I try to summarize as this:

This can be easier visualized as in this video of rancher v2.9-e0af6cccbff0210e1538004dfc67f10f40597c20-head | fleet:v0.10.0-rc.15 :

https://github.com/rancher/fleet/assets/37271841/3968d434-5828-41e3-b8ee-c34017bd5b5f

As opposed to this one in v2.8.4 | fleet:103.1.5+up0.9.5 where things look smoother:

https://github.com/rancher/fleet/assets/37271841/03edfffb-366c-44b4-a6ca-d7c927f08411

Attaching logs of gitjob: gitjob_logs.json

Logs summary here {"level":"error","ts":"2024-06-14T08:01:43Z","logger":"git-latest-commit-poll-watch","msg":"error fetching commit","gitrepo":{"kind":"GitRepo","apiVersion":"fleet.cattle.io/v1alpha1","metadata":{"name":"test29","namespace":"fleet-local","uid":"234426b7-3e2c-4e4d-9a │ │ {"level":"error","ts":"2024-06-14T08:01:58Z","logger":"git-latest-commit-poll-watch","msg":"error fetching commit","gitrepo":{"kind":"GitRepo","apiVersion":"fleet.cattle.io/v1alpha1","metadata":{"name":"test29","namespace":"fleet-local","uid":"234426b7-3e2c-4e4d-9a │ │ {"level":"info","ts":"2024-06-14T08:02:01Z","logger":"gitjob","msg":"job deletion triggered because of generation change","controller":"gitrepo","controllerGroup":"fleet.cattle.io","controllerKind":"GitRepo","GitRepo":{"name":"test29","namespace":"fleet-local"},"na │ │ {"level":"error","ts":"2024-06-14T08:02:01Z","msg":"Reconciler error","controller":"gitrepo","controllerGroup":"fleet.cattle.io","controllerKind":"GitRepo","GitRepo":{"name":"test29","namespace":"fleet-local"},"namespace":"fleet-local","name":"test29","reconcileID" │ │ {"level":"error","ts":"2024-06-14T08:03:09Z","logger":"git-latest-commit-poll-watch","msg":"error fetching commit","gitrepo":{"kind":"GitRepo","apiVersion":"fleet.cattle.io/v1alpha1","metadata":{"name":"test-29","namespace":"fleet-local","uid":"9b7677fb-c6a6-4904-a │ │ {"level":"error","ts":"2024-06-14T08:03:24Z","logger":"git-latest-commit-poll-watch","msg":"error fetching commit","gitrepo":{"kind":"GitRepo","apiVersion":"fleet.cattle.io/v1alpha1","metadata":{"name":"test-29","namespace":"fleet-local","uid":"9b7677fb-c6a6-4904-a │ │ {"level":"info","ts":"2024-06-14T08:03:39Z","logger":"git-latest-commit-poll-watch","msg":"new commit found","gitrepo":{"kind":"GitRepo","apiVersion":"fleet.cattle.io/v1alpha1","metadata":{"name":"test-29","namespace":"fleet-local","uid":"9b7677fb-c6a6-4904-aafe-64 │ │ {"level":"error","ts":"2024-06-14T08:04:09Z","logger":"git-latest-commit-poll-watch","msg":"error fetching commit","gitrepo":{"kind":"GitRepo","apiVersion":"fleet.cattle.io/v1alpha1","metadata":{"name":"test-29","namespace":"fleet-local","uid":"9b7677fb-c6a6-4904-a │ │ {"level":"info","ts":"2024-06-14T08:04:21Z","logger":"gitjob","msg":"job deletion triggered because of generation change","controller":"gitrepo","controllerGroup":"fleet.cattle.io","controllerKind":"GitRepo","GitRepo":{"name":"test-29","namespace":"fleet-local"},"n

Reproduction steps:

You can watch the video to better reproduce but in nutshell:

  1. Deploy Rancher 2.9 with fleet fleet:v0.10.0-rc.15. (You can use our ci if desired to spin a machine from here)
  2. Deploy on fleet-local a git repo with a space at the end of the URL. I used
    Repository URL: `https://github.com/rancher/fleet-test-data ` (notice the space at the end)
    Branch: `master`
    Paths: `qa-test-apps/nginx-app`

    Image

  3. Observe Issue 1
  4. Now go back to Git Repo and correct the incorrect space at the end of URL. Save
  5. Observe Issue 2 (long time to update repo compared to 2.8.4)
  6. Go back and set a wrong URL adding a Space. Save
  7. Wait a few seconds.
  8. While monitoring jobs ond a terminal observe Issue 3 on UI (the job

Test environment

mmartin24 commented 2 weeks ago

Added this other issue to prevent from the UI that a Git Repo with invalid URL due to spaces or bad characters is allowed to go through: https://github.com/rancher/dashboard/issues/11239

0xavi0 commented 1 week ago

Both issues are fleet related. Thanks for the hunt. :detective:

  1. Issue 1 occurs because fleet is not updating the conditions in the gitrepo when detecting an error in the git poller
  2. Issue 2 occurs because fleet is not rescheduling the poller job when the Spec changes, so the user has to wait for the next cycle to pick up those changes. (We're only rescheduling if the user changes the interval, but in this case we're changing the url)

https://github.com/rancher/fleet/pull/2542 should fix both

0xavi0 commented 1 week ago

The following video shows the fix running with rancher 2.9-head and a dev version of fleet

https://github.com/rancher/fleet/assets/96239481/0257bdaa-154a-4815-8995-ec8a2cf2d0ad