rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.51k stars 227 forks source link

Overlay-resource only deployed to 1 cluster due to clusterName selector #1403

Closed strowi closed 1 year ago

strowi commented 1 year ago

Is there an existing issue for this?

Current Behavior

Hi,

i have scoured the docs and tried debugging for a couple of hours now, but couldn't find anything explaining this issue. I have a fleet-repsitory containg bare-yaml + overlays. The overlays all contain a secret with the same file- and kubernetes-name/metadata but different content.

Now it seems that the secret gets deployed only to a single cluster (which ever is first i guess).

PS: I have a similiar behaviour when using kustomize overlays.

Expected Behavior

I would expect that each secret gets deployed to the matching cluster.

Steps To Reproduce

I created a demo-repo at https://gitlab.com/strowi/fleet-test/-/tree/master/test .

  1. Create 2+ clusters
  2. Create an overlay for each of the cluster containing the same secret with same name, but different data.
  3. deploy the bundle
  4. see that the secret gets only deployed to 1 cluster.

Environment

- Fleet Version: rancher-2.7.1
- Cluster:
  - Provider: k3s
  - Kubernetes Version: 1.24.10

Logs

No response

Anything else?

No response

strowi commented 1 year ago

Further tests show it seems to be working when i use the following matchLabels instead of clusterName.

targetCustomizations:
- name: c0530
  clusterSelector:
    matchLabels:
      management.cattle.io/cluster-display-name: test-cluster-0530
  yaml:
    overlays:
      - cluster-0530
- name: c0531
  clusterSelector:
    matchLabels:
      management.cattle.io/cluster-display-name: test-cluster-0531
  yaml:
    overlays:
      - cluster-0531

Is there some limitation to the clustername? (ours are named 'abcd-efgh-4digits' which i didn't reflect in the first example)

strowi commented 1 year ago

Got another example here, where the clusterName doesn't work as inteded:

---
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: kafka
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"fleet.cattle.io/v1alpha1","kind":"GitRepo","metadata":{"annotations":{},"name":"kafka","namespace":"fleet-default"},"spec":{"branch":"main","clientSecretName":"gitlab-ci","paths":["/kafka"],"pollingInterval":"1m","repo":"https://gitlab.com/xyz/fleet.git","targets":[{"clusterName":"kafka"}]}}
  creationTimestamp: '2023-05-24T08:46:40Z'
  generation: 14
  labels:
    {}
  namespace: fleet-default
  resourceVersion: '156411260'
  uid: 6b9c7a72-a9a6-4256-ad21-bffd8c43c7b6
  fields:
    - kafka
    - https://gitlab.com/xyz/fleet.git
    - 6250977f9b897e39ab86bc1002f1cb6bb9e5f4f8
    - 0/0
    - null
spec:
  branch: main
  clientSecretName: gitlab-ci
  forceSyncGeneration: 8
  insecureSkipTLSVerify: false
  paths:
    - /kafka
#    - string
  paused: false
  pollingInterval: 1m
  repo: https://gitlab.com/xyz/fleet.git
  targets:
    - clusterName: kafka
__clone: true

replacing the clusterName kafka with the clusterID c-xyz seems to work.

kkaempf commented 1 year ago

internal ref SURE-6499 - targetCustomizations by clusterName in fleet.yaml not working

strowi commented 1 year ago

Also just noticed the docs refer to targets clusterName but when using rancher-ui and selectin a cluster it results in

targets:
    - clusterName: c-rczdc

Which is not the displayed-name, but the clusterId. This can be very confusing.

manno commented 1 year ago

replacing the clusterName kafka with the clusterID c-xyz seems to work.

It seems there is no bug then and we can close this?

Rancher has three different cluster resources:

The management cluster is not namespaced and has a random name. Fleet will only match the fleet cluster name, which should be the same as the provisioning cluster name.

I find it strange that Rancher's dashboard/c/local/explorer/provisioning.cattle.io.cluster shows you the provisioning clusters, but when you click "View YAML", you are presented with the management cluster's YAML.

Anyhow, Fleet will only use the clusters from dashboard/c/local/explorer/fleet.cattle.io.cluster, which is what /dashboard/c/_/fleet/fleet.cattle.io.cluster shows.

strowi commented 1 year ago

Thx for checking!

I created a couple of clusters via rancher-ui and named them "abcd", "efgh" .

Now when i check on the management cluster, i get the following - some names are correct some have a unique c-$hash ?

~< kg clusters.fleet.cattle.io --all-namespaces                                                                                                                                                                        (rancher:cattle-system)
NAMESPACE       NAME      BUNDLES-READY   NODES-READY   SAMPLE-NODE                        LAST-SEEN              STATUS
fleet-default   c-76kfg   0/1             1/1           xyz               2023-04-26T09:13:46Z   
fleet-default   c-ftczs   14/14           3/3           xyz   2023-07-18T12:07:38Z   
fleet-default   c-hg6jv   12/12           4/4           xyz       2023-07-18T12:00:10Z   
fleet-default   c-rczdc   11/11           4/4           xyz        2023-07-18T12:05:45Z   
fleet-default   abcd   12/12           1/1           xyz     2023-07-18T12:03:45Z   

So this gives very inconsistent and confusing results when using a targetMatch on clusterName.

Thinking about it. this sounds a bit more of a rancher problem instead of fleet?

( On top of that i probably got confused between the clusters.management and clusters.fleet.. )

manno commented 1 year ago

The Rancher UI is definitely confusing with the three types of clusters, but it does seem to do the right thing when I target a cluster.

And there is no UI for customization, I can add something to the Fleet docs to warn users about picking the right cluster name.

strowi commented 1 year ago

A note would be good, because the fleet-clusterName is nowhere visible in the rancher-ui (except when hovering the links).

So using "clusterName" in fleet in rancher is no useable because the name can be random. Guess i should open another issue in rancher/rancher then?

manno commented 1 year ago

Not sure, I think the fleet cluster name is actually visible in most pages in Rancher. I'm using 2.7.5, though.

strowi commented 1 year ago

I am on 2.7.5 too, and i don't see it. The only place where i may be able to guess is on the fleet-bundle page. Otherwise i have to go into the "View Yaml".

For example i have this in the "Cluster management" view, and the fleet-integration also shows this as name: image

But the yaml show this (metadata.name vs. spec.name):

apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
  annotations:
    authz.management.cattle.io/creator-role-bindings: '{"created":["cluster-owner"],"required":["cluster-owner"]}'
    field.cattle.io/creatorId: user-srcst
    lifecycle.cattle.io/create.cluster-agent-controller-cleanup: 'true'
    lifecycle.cattle.io/create.cluster-provisioner-controller: 'true'
    lifecycle.cattle.io/create.cluster-scoped-gc: 'true'
    lifecycle.cattle.io/create.mgmt-cluster-rbac-remove: 'true'
    management.cattle.io/current-cluster-controllers-version: 1.27.3+k3s1
    provisioner.cattle.io/encrypt-migrated: 'true'
    provisioner.cattle.io/ke-driver-update: updated
  creationTimestamp: '2023-05-24T08:33:10Z'
  finalizers:
    - controller.cattle.io/cluster-agent-controller-cleanup
    - controller.cattle.io/cluster-scoped-gc
    - controller.cattle.io/cluster-provisioner-controller
    - controller.cattle.io/mgmt-cluster-rbac-remove
    - wrangler.cattle.io/mgmt-cluster-remove
  generateName: c-
  generation: 928
  labels:
    cattle.io/creator: norman
    provider.cattle.io: k3s
  managedFields: ...
  name: c-rczdc
  resourceVersion: '201040868'
  ...
spec:
  description: imported via terraform
  displayName: kafka

While Rancher (and it's fleet integration) displays spec.displayName which is what i entered, Fleet uses metadata.name for clusters.management.cattle.io, which is c-rczdc.

So the workflow to add a cluster, and a new fleet-bundle on that single cluster would be:

  1. create cluster kafka in Rancher
  2. check the Yaml of the kafka-cluster in Rancher for spec.name (instead of using the name i just entered)
  3. use spec.name in the fleet-bundle for clusterName

No offense, but who would remember to check the yaml for the name to be what i just entered on creation?

If this is no bug in fleet, my guess would be that there is some mechanism (maybe reason?) in Rancher that creates cluster.management.cattle.io with a random hash in the back instead of kafka which is displayed in the UI..

As for a solution: i don't really know, maybe Rancher (should have an option to) display clusterName + clusterID ?

manno commented 1 year ago

Oh, but from your example name: c-rczdc should not work as selector. stagingk8s and kafka should work.

That's what I meant earlier, it's very confusing that Rancher shows you the apiVersion: management.cattle.io/v3 YAML for clusters. But, well it's "Rancher" clusters, as opposed to "Continuous Delivery" clusters.

strowi commented 1 year ago

Sorry if i didn't make it clearer, i guess at some point everyone is just left confused...

In my example above, kafka is the name i gave the new cluster, when creating it via the Rancher-UI. And that is displayed (everywhere) in "Rancher" and the "Continuous Integration"-part.

But in the back rancher created a clusters.management.cattle.io-resource with name: c-.... And this is what is used for fleet, also just in the back, while dispaying spec.displayNameof kafka in the Rancher-UI (except the view Yaml which shows the kubernetes-resource.)

On the UI everything seems to work fine, but if i try a targetCustomization or other selector using clusterName, i have to use the c-... instead of the everywhere displayed Rancher-Name of the Cluster.

Hence why i tried clarifying when creating a new cluster kafka - clusters.managment.cattle.io contains:

If you create clusters via UI, and create the fleet.yaml via code - naturally you would use clusterName: kafka which is wrong. If you ignore Rancher-UI completely and have access to it's kubernetes-api, you will see the correct c-... names.

And i just noticed: If you EDIT a GitRepo via Ranchers Continous-Integration changing the Target to kafka it will use c-... for the yaml.

So i am not sure, but would say rancher is creating the confusion (definitely) and bug (?) by displaying a different Name than is being used in the back for the kubernetes-resources and fleet.

Hope i could make it clearer this time, otherwise i'm also happy to have a direct chat about it. ;)

manno commented 1 year ago

Thanks for describing the problem again :)

But in the back rancher created a clusters.management.cattle.io-resource with name: c-.... And this is what is used for fleet, also just in the back, while dispaying spec.displayNameof kafka in the Rancher-UI (except the view Yaml which shows the kubernetes-resource.) On the UI everything seems to work fine, but if i try a targetCustomization or other selector using clusterName, i have to use the c-... instead of the everywhere displayed Rancher-Name of the Cluster.

Yes, Rancher creates a management cluster and syncs it with fleet and provisioning cluster resources. However Fleet doesn't even know the management cluster CRD. It should always use the clusters.fleet.cattle.io resource and can only match against that name.

If you ignore Rancher-UI completely and have access to it's kubernetes-api, you will see the correct c-... names.

kubectl get clusters.fleet.cattle.io -A should show you the usable names.

And i just noticed: If you EDIT a GitRepo via Ranchers Continous-Integration changing the Target to kafka it will use c-... for the yaml.

That would be a UI bug.

So i am not sure, but would say rancher is creating the confusion (definitely) and bug (?) by displaying a different Name than is being used in the back for the kubernetes-resources and fleet.

Yes, the three cluster resources are confusing, but well, it's for historic reasons.

Shavindra commented 1 year ago

And i just noticed: If you EDIT a GitRepo via Ranchers Continous-Integration changing the Target to kafka it will use c-... for the yaml.

As it stands this is intended for UI. Prominence is given to management.cattle.io/cluster-display-name as to what we show. However, spec.targets.clusterName is the fleet cluster name / management cluster id that is used to target particular cluster.

https://github.com/rancher/dashboard/blob/f6b249127b47a61fcdc95cef1a500c8b8b60893f/shell/models/fleet.cattle.io.cluster.js#L87-L89

Can raise this as an issue with https://github.com/rancher/dashboard/issues

CC: @kwwii

strowi commented 1 year ago

Thx for being untiring. ;)

I understand that on the fleet-part everything works as intended (after the initial irritation and hour-long debugging of the initial problem).

I just want to stress on the rancher-ui part, that it is very weird and not really user-friendly if the UI displays name 'kafka' consistently, but fleet uses a non-visible (except clicking through the cluster -> view yaml) different "name" 'c-...'. (as mentioned, this probably belongs more in rancher/rancher than here at this point).

Otherwise i now know to use:

clusterSelector:
    matchLabels:
      management.cattle.io/cluster-display-name: kafka
...

to not run into this problem. ;)

manno commented 1 year ago

@Shavindra Should I leave this open for now, as a reference for possible UI enhancements? I think there is nothing to do on the fleet side (for now).

Shavindra commented 1 year ago

Created https://github.com/rancher/dashboard/issues/9424 to track this

CC: @strowi

manno commented 1 year ago

Closing this until work is needed from fleet :)