kubeslice / worker-operator

Kubeslice Worker Operator Opensource Repository: The KubeSlice Worker Operator is a Kubernetes operator that manages the lifecycle of KubeSlice worker clusters.
Apache License 2.0
58 stars 19 forks source link

feat(SliceGwReconciler): Add `PodDisruptionBudget` logic to `SliceGwReconciler` (#308) #334

Closed Bhargav-InfraCloud closed 3 months ago

Bhargav-InfraCloud commented 4 months ago

Description

A PodDisruptionBedget is required that matches the slice gateway pods, and to specify a minimum availability of 1 pod in case of disruptions.

The SliceGwReconciler handles the lifecycle of this PodDisruptionBudget object.

Added RBAC permissions for SliceGwReconciler to maintain PodDisruptionBudget.

Fixes #308

How Has This Been Tested?

Checklist:

Does this PR introduce a breaking change?

Steps to test

  1. Deploy the whole setup controller and 2 workers. Check if there are slice gateway pods similar to this in both the worker clusters:
    water-kind-kubeslice-worker-2-kind-kubeslice-worker-1-0-0-6tqrk   3/3     Running   0          61m
    water-kind-kubeslice-worker-2-kind-kubeslice-worker-1-1-0-h4vsp   3/3     Running   0          61m
  2. With this change, there should also be a PodDisruptionBudget created in the same namespace
    water-kind-kubeslice-worker-2-kind-kubeslice-worker-1-pdb   1               N/A               1                     62m

    with labels similar to:

    labels:
    kubeslice.io/slice: water
    kubeslice.io/slice-gw: water-kind-kubeslice-worker-2-kind-kubeslice-worker-1
  3. When trying to disrupt the node, say one of the worker cluster's worker nodes (if having multi-node setup), using the command:
    kubectl drain --ignore-daemonsets --delete-emptydir-data <worker-node-name-here>

    it should be able to evict one pod but fail to evict the other:

    
    pod/water-kind-kubeslice-worker-2-kind-kubeslice-worker-1-0-0-6tqrk evicted

error when evicting pods/"water-kind-kubeslice-worker-2-kind-kubeslice-worker-1-1-0-h4vsp" -n "kubeslice-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

Bhargav-InfraCloud commented 4 months ago

The issue #335 impacts this PR. Hence holding this until it is resolved.

mridulgain commented 4 months ago

Thanks for bringing this to our attention @Bhargav-InfraCloud . The issue #335 has been resolved.

Bhargav-InfraCloud commented 4 months ago

Thanks @mridulgain! Changes in this PR are now based on the latest master.

Bhargav-InfraCloud commented 4 months ago

The E2E is failing as the image is missing in the Docker hub:

Unable to find image 'aveshadev/kubeslice-e2e:latest' locally
docker: Error response from daemon: pull access denied for aveshadev/kubeslice-e2e, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

Was it moved to somewhere else?

Bhargav-InfraCloud commented 4 months ago

@narmidm @bharath-avesha @gourishkb Can you please review this PR? Thanks!

mridulgain commented 4 months ago

The E2E is failing as the image is missing in the Docker hub:

Unable to find image 'aveshadev/kubeslice-e2e:latest' locally
docker: Error response from daemon: pull access denied for aveshadev/kubeslice-e2e, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

Was it moved to somewhere else?

@Bhargav-InfraCloud we have resolved the image issue & re triggered the pipeline.

Bhargav-InfraCloud commented 4 months ago

@mridulgain Great! Thanks for the update. I've rebased with the latest master. Please trigger the E2E again.

NishantSingh10 commented 4 months ago

report link 'https://kubeslice.github.io/e2e-allure-reports/Kind-worker-operator-2024-03-08T08:57:25-master-437/index.html'

narmidm commented 3 months ago

started E2E pipeline - https://github.com/kubeslice/worker-operator/actions/runs/8244374883/job/22546499751?pr=334

NishantSingh10 commented 3 months ago

report link 'https://kubeslice.github.io/e2e-allure-reports/Kind-worker-operator-2024-03-12T07:43:01-master-439/index.html'

Bhargav-InfraCloud commented 3 months ago

Thanks, all! 😊