Open ksatchit opened 4 years ago
@ksatchit can the chaos engine resources & schedule resources be specified here. The yaml versions are sufficient.
A typical ChaosEngine today looks like the following. The user generally changes (based on actual usage info): the .spec.appinfo
section while keep the rest of the changes are recommended. More info is provided here: https://docs.litmuschaos.io/docs/chaosengine/ (multiple optional fields exist).
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: nginx-chaos
namespace: default
spec:
appinfo:
appns: 'default'
applabel: 'app=nginx'
appkind: 'deployment'
annotationCheck: 'true'
engineState: 'active'
chaosServiceAccount: pod-delete-sa
jobCleanUpPolicy: 'delete'
experiments:
- name: pod-delete
spec:
components:
env:
# set chaos duration (in sec) as desired
- name: TOTAL_CHAOS_DURATION
value: '30'
# set chaos interval (in sec) as desired
- name: CHAOS_INTERVAL
value: '10'
# pod failures without '--force' & default terminationGracePeriodSeconds
- name: FORCE
value: 'false'
The schedule is closed source today (and is essentially chaosengine++), but is planned to be converted to a separate CR to hold on that data: i.e., schedule. We can take a shot at it here (i.e., in a separate issue)
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deploy
namespace: my-ns
annotations:
chaos.litmus.io/enabled: true
engine-generate.chaos.litmus.io/enabled: true
pod-delete.experiment.chaos.litmus.io/enabled: true
apiVersion: metac.openebs.io/v1alpha1
kind: GenericController
metadata:
name: chaosengine-generator-for-deployment
namespace: doperator
spec:
watch:
apiVersion: apps/v1
resource: deployments
attachments:
apiVersion: litmuschaos.io/v1alpha1
kind: chaosengines
advancedSelector:
selectorTerms:
- matchReferenceExpressions:
- key: metadata.annotations.generator\.chaosengine\.litmus\.io/uid
refKey: metadata.uid
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: <can-this-be-deployment-name?>
namespace: <can-this-be-deployment-namespace>
annotations:
generator.chaosengine.litmus.io/uid: <deployment-under-test-uid>
spec:
appinfo:
appns: <will be derived from deployment>
applabel: <will be derived from deployment's images e.g. 'app=nginx'>
appkind: deployment
annotationCheck: 'true'
engineState: 'active'
chaosServiceAccount: <how-to-derive? e.g. pod-delete-sa>
jobCleanUpPolicy: 'delete'
experiments:
- name: pod-delete
spec:
components:
env: <how-to-set? e.g. should it refer to config CR to set below>
# set chaos duration (in sec) as desired
- name: TOTAL_CHAOS_DURATION
value: '30'
# set chaos interval (in sec) as desired
- name: CHAOS_INTERVAL
value: '10'
# pod failures without '--force' & default terminationGracePeriodSeconds
- name: FORCE
value: 'false'
Requirement:
litmuschaos.io/pod-delete: true | false
(or removal).litmuschaos.io/pod-delete-schedule: <time units>
which might create a schedule custom resource (the scheduler controller then creating the engine)Considerations:
The chaosengine spec is today holds (a) app info / (b) experiment info / (c) run-properties. Out of these (c) and to an extent (b) are static information as far as a user/SRE is concerned and can be derived from
templates
the user can pull. The app info (a) is desired to be pulled by the controller under discussion: as part of generating the chaosengineattachment
.Schedule are subject to halt/resumption. Lets say that is handled by other controllers (scheduler, as discussed above). In such cases the current engine/schedule generator should be aware and reconcile accordingly.