rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.5k stars 219 forks source link

Feature Request: Enhance Fleet Bundles with resource verification capabilities #2649

Open Kristian-ZH opened 1 month ago

Kristian-ZH commented 1 month ago

Is your feature request related to a problem?

The current deployment capabilities of Fleet Manager lack robust post-deployment verification across clusters. After deploying resources via a Fleet Bundle, there is no mechanism to verify and ensure that these resources are functioning as intended across all targeted clusters. This oversight can lead to operational blind spots, especially when managing large-scale deployments across diverse Kubernetes environments.

Example: Deploying the System Upgrade Controller's Plan resource via Fleet necessitates monitoring its status. This monitoring is crucial for effectively tracking the upgrade procedure across downstream clusters.

Solution you'd like

Introduce comprehensive status checking capabilities within Fleet Bundles to address these challenges:

Example:

kind: Bundle
apiVersion: fleet.cattle.io/v1alpha1
metadata:
  # Any name can be used here
  name: my-bundle
  # For single cluster use fleet-local, otherwise use the namespace of
  # your choosing
  namespace: fleet-local
spec:
  resources:
  # List of all resources that will be deployed
  - content: |
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: nginx-deployment
        labels:
          app: nginx
      spec:
        replicas: 3
        selector:
          matchLabels:
            app: nginx
        template:
          metadata:
            labels:
              app: nginx
          spec:
            containers:
              - name: nginx
                image: nginx:1.14.2
                ports:
                  - containerPort: 80
    name: nginx.yaml
    conditions:
      - type: Ready
        status: "True"
  targets:
  - clusterName: local

Finally, the Bundle resource should report the Ready status, when all the Conditions in its resources pass successfully.

Alternatives you've considered

No response

Anything else?

No response

olblak commented 1 month ago

@Kristian-ZH Thanks for reporting this issue, and it's a valid concern. In the latest Fleet version (0.10.0), released yesterday

Fleet has now the ability to export metrics to Prometheus. cfr This will allow user to define alerts based on those metrics.

Kristian-ZH commented 1 month ago

This is a good enhancement, but does it handle the case where we want to monitor the status of the deployed resources via Fleet? As Fleet currently tracks only the deployment of those resources, I am not sure that the users will be able to monitor anything more than whether the resource is deployed or not