litmuschaos / litmus-go

Apache License 2.0
66 stars 118 forks source link

Node drain disruption not reverted after "kubectl drain" timeout #667

Closed Calvinaud closed 11 months ago

Calvinaud commented 1 year ago

BUG REPORT

What happened: The disruption is not reverted if there an error in https://github.com/litmuschaos/litmus-go/blob/master/chaoslib/litmus/node-drain/lib/node-drain.go#L134. Main error we encounter in this command is when there is timeout when the drain take too long.

What you expected to happen: I expect the disruption to be reverted after the experiments even after a timeout during the drain.

How to reproduce it (as minimally and precisely as possible): ChaosEngine to reproduce the error (TOTAL_CHAOS_DURATION really low just to see the error easily)

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  namespace: litmus
  name: node-drain-test
spec:
  appinfo:
    appns: default
    applabel: app=network-nginx
    appkind: deployment
  engineState: active
  chaosServiceAccount: litmus-admin
  jobCleanUpPolicy: retain
  experiments:
    - name: node-drain
      spec:
        components:
          env:  
            - name: TOTAL_CHAOS_DURATION
              value: '1'

Anything else we need to know?: Tested both in 2.14 and 3.0.0-beta8.