rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.52k stars 229 forks source link

Ignore missed resources #2051

Open SnelsSM opened 10 months ago

SnelsSM commented 10 months ago

Is your feature request related to a problem?

Some charts (consul for example) removes temporary resources (such as jobs) after deployment. In this situation, the state of bundle resource is "Modified ... %some resource% missing" and there are no options to ignore this missing resource.

Solution you'd like

Maybe need some options in comparePatches. Something like:

spec:
  diff:
    comparePatches:
    - apiVersion: batch/v1
      kind: Job
      name: consul-consul-server-acl-init
      namespace: consul
      operations:
      - op: ignore

Perhaps there is a ready-made solution? I tried to find it, but I didn't find it.

Alternatives you've considered

No response

Anything else?

No response

jhoblitt commented 9 months ago

I've run into this problem as well for jobs with a ttl set. I tried ignoring all paths but it doesn't work. E.g.

    - apiVersion: batch/v1
      kind: Job
      namespace: rook-ceph
      jsonPointers:
        - /
weyfonk commented 8 months ago

I have tried to reproduce this with Fleet 0.8 and 0.9 installing a Consul (1.3.3) chart, without success. In both cases, the bundle was ready even after jobs were deleted by the chart.

Here is the fleet.yaml used for testing (no bundle diffs involved):

defaultNamespace: consul
helm:
  releaseName: test-consul
  chart: "consul"
  repo: "https://helm.releases.hashicorp.com"

  version: "1.3.3"

  values:
    global:
      name: consul

Do you have more details about the Fleet version and config used here?

jhoblitt commented 8 months ago

@weyfonk I've worked on this a bit more and it seems that if the chart uses the hook annotations. E.g.:

  annotations:
    helm.sh/hook: post-install, post-upgrade
    helm.sh/hook-delete-policy: hook-succeeded, before-hook-creation

Then fleet does not warn about missing resources. This is the workaround that I've been using instead of setting a ttl. It also seems that setting the hook annotations works when using a "plain" yaml bundle.

However, I still think there's a good use case for being able to configure the drift detection to ignore a missing resource.

It should be possible to reproduce this with a simple yaml bundle of:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  ttlSecondsAfterFinished: 100
  template:
    spec:
      containers:
      - name: pi
        image: perl:5.34.0
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4
weyfonk commented 8 months ago

Thanks @jhoblitt, that job spec does reproduce the issue. We will need to have a closer look at our bundle diffs feature to check its support for whole resources. I've tried using a jsonPointers field with an empty string, as explained here, to point to the root of the job, but to no avail.

SnelsSM commented 7 months ago

Here is the fleet.yaml used for testing (no bundle diffs involved):

The problem happens when global.acls.manageSystemACLs = true. Helm creates 2 jobs: %release name%-server-acl-init and %release name%-server-acl-init-cleanup %release name%-server-acl-init-cleanup then removes the %release name%-server-acl-init. The result: Modified(1) [Bundle rke2-ops-local-consul-charts-consul]; job.batch consul/consul-consul-server-acl-init missing

manno commented 1 month ago

It should be possible to ignore resources completely by omitting them from the plan: https://github.com/rancher/fleet/blob/29471373d7ad6cd8b2f36e70cc7e25d8e7ebb8b5/internal/cmd/agent/deployer/desiredset/plan.go#L31-L46