This service provides a way to perform chaos tests on your applications triggered by Keptn using the LitmusChaos framework. Learn more about this integration in our 2-part blog series: part 1, part 2.
Keptn Version | litmus-service Docker Image |
---|---|
0.7.1 | keptnsandbox/litmus-service:0.1.0 |
0.7.2 | keptnsandbox/litmus-service:0.1.0 |
0.7.3 | keptnsandbox/litmus-service:0.1.1 |
0.8.0-0.8.3 | keptnsandbox/litmus-service:0.2.0 |
0.8.4-0.8.5 | keptnsandbox/litmus-service:0.2.1 |
0.19.0 | keptnsandbox/litmus-service:0.2.2 |
The Keptn litmus-service requires the following prerequisites to be setup on the Kubernetes cluster for it to run the chaos tests:
ChaosExperiment
custom resources (CRs)serviceaccount
, role
, rolebinding
) associated with the chaos test Execute the following commands to setup these dependencies for a demo setup:
kubectl apply -f ./test-data/litmus/litmus-operator-v2.13.0.yaml
kubectl apply -f ./test-data/litmus/pod-delete-ChaosExperiment-CR.yaml
kubectl apply -f ./test-data/litmus/pod-delete-rbac.yaml
This service reacts on the following Keptn CloudEvents (see deploy/service.yaml):
sh.keptn.event.test.triggered
(used to be sh.keptn.events.deployment-finished
) -> start litmus chaos testssh.keptn.event.test.finished
(used to besh.keptn.events.tests-finished
) -> clean up residual chaos resourcesNotes:
This repo provides the example (yaml specifications) of a pod-delete chaos test.
You can choose to specify other experiments depending on your need, when building your own litmus service.
Ensure that the correct ChaosEngine
spec is provided in the experiment manifest along with the
corresponding ChaosExperiment
CR & RBAC manifests.
This repo uses the sample helloservice app as the Application-Under-Test (AUT) to illustrate the impact of chaos. Hence, the experiment is populated with the respective attributes for app filtering purposes. Ensure you have the right data placed in the spec.appinfo
when adopting this for your environments.
To deploy the current version of the litmus-service in your Keptn Kubernetes cluster, clone the repo and apply the deploy/service.yaml
file:
kubectl apply -f deploy/service.yaml
This will install the litmus-service
into the keptn
namespace, which you can verify using:
kubectl -n keptn get deployment litmus-service -o wide
kubectl -n keptn get pods -l run=litmus-service
To make use of the Litmus service, a dedicated experiment.yaml
file with the actual chaos experiment has to be added to Keptn (for the service under test).
You can do this via the Keptn CLI, please replace the values for project
, stage
, service
and resource
with your actual values. But note that the resourceUri
has to be set to litmus/experiment.yaml
.
keptn add-resource --project=litmus --stage=chaos --service=carts --resource=litmus/experiment.yaml --resourceUri=litmus/experiment.yaml
Please note that it is recommended to run the chaos experiment along with some load testing.
Now when a send-test
event is sent to Keptn, the chaos test will be triggered along with the load tests. Once the load tests are finished, Keptn will do the evaluation and provide you with a result. With this you can then verify if your application is resilient in the way that your SLOs are still met.
The service implements handlers for triggering the chaos tests in the "testing phase" of Keptn, that means that Keptn will trigger the chaos tests right after deployment. The test is executed by a set of chaos pods (notably, the chaos-runner & experiment pod) and the test results stored in a ChaosResult
custom resource. The duration of the test & other tunables can be configured in the ChaosEngine
resource. Refer to the Litmus docs on supported tunables. Litmus ensures that the review app/deployment is restored to it's initial state upon completion of the test.
The Keptn litmus-service also conditionally generates & handles the test.finished
event by cleaning up residual chaos resources (running or completed) in the cluster.
It is a standard practice to execute the chaos tests in parallel with other performance/load tests running on the AUT. The subsequent quality gate evaluations in such cases are more reflective of real world outcomes.
Note: The sample project provided in this repo (in the test-data
folder), uses a jmeter load test
against the AUT, carts, running in parallel with the pod-delete chaos test.
To delete the litmus-service, delete using the deploy/service.yaml
file:
kubectl delete -f deploy/service.yaml
Adapt and use the following command in case you want to upgrade or downgrade your installed version (specified by the $VERSION
placeholder):
kubectl -n keptn set image deployment/litmus-service litmus-service=keptnsandbox/litmus-service:$VERSION --record
The service implements simple handlers for the sh.keptn.event.test.triggered
& sh.keptn.event.test.finished
events - i.e., triggers chaos by creating the ChaosEngine
resource, fetching info from ChaosResult
resource & eventually deleting them, respectively. In case you would need additional functions/capabilities, update the eventhandlers.go. For more info around how to go about this, view the Development section.
Considering the litmus-service runs in the keptn namespace & acts on resources/applications on other namespaces (as per the project/stage names), it uses a cluster-wide RBAC. Tune the permissions associated with this service based on functionality needed apart from CRUD on ChaosEngine
& ChaosResults
.
In case you would like to cleanup chaos resources immediately after completion of the chaos test (either because you aren't running other tests of primary significance such as perf tests), set the environment variable SEND_TEST_FINISHED_EVENT
to true
in the litmus-service deployment.
Development can be conducted using any Golang compatible IDE/editor (e.g., Jetbrains GoLand, VSCode with Go plugins).
It is recommended to make use of branches as follows:
master
contains the latest potentially unstable versionrelease-*
contains a stable version of the service (e.g., release-0.1.0
contains version 0.1.0)feature/my-cool-stuff
or bug/overflow
master
branchWhen writing code, it is recommended to follow the coding style suggested by the Golang community.
If you don't care about the details, your first entrypoint is eventhandlers.go. Within this file you can add implementation for pre-defined Keptn Cloud events.
To better understand Keptn CloudEvents, please look at the Keptn Spec.
If you want to get more insights, please look into main.go, deploy/service.yaml, consult the Keptn docs as well as existing Keptn Core and Keptn Contrib services.
go build -ldflags '-linkmode=external' -v -o litmus-service
go test -race -v ./...
docker build . -t keptnsandbox/litmus-service:dev
(Note: Ensure that you use the correct DockerHub account/organization)docker run --rm -it -p 8080:8080 keptnsandbox/litmus-service:dev
docker push keptnsandbox/litmus-service:dev
(Note: Ensure that you use the correct DockerHub account/organization, e.g., your personal account like docker push myaccount/litmus-service:dev
)kubectl
: kubectl apply -f deploy/
kubectl
: kubectl delete -f deploy/
kubectl
: kubectl -n keptn get deployment litmus-service -o wide
kubectl
: kubectl -n keptn logs deployment/litmus-service -f
kubectl
: kubectl -n keptn get pods -l run=litmus-service
skaffold run --default-repo=your-docker-registry --tail
(Note: Replace your-docker-registry
with your DockerHub username; also make sure to adapt the image name in skaffold.yaml)We have dummy cloud-events in the form of RFC 2616 requests in the test-events/ directory. These can be easily executed using third party plugins such as the Huachao Mao REST Client in VS Code.
This repo uses reviewdog for automated reviews of Pull Requests.
You can find the details in .github/workflows/reviewdog.yml.
This repo has automated unit tests for pull requests.
You can find the details in .github/workflows/CI.yml.
This repo uses GH Actions to automatically build docker images.
The following secrets need to be added on your repository secrets:
REGISTRY_USER
- your DockerHub usernameREGISTRY_PASSWORD
- a DockerHub access token (alternatively, your DockerHub password)Furthermore, the variable IMAGE
needs to be configured properly in .ci_env
IMAGE=keptnsandbox/litmus-service
It is assumed that the current development takes place in the master branch (either via Pull Requests or directly).
To make use of the built-in automation using Travis CI for releasing a new version of this service, you should
release-x.y.z
(where x.y.z
is your version),If any problems occur, fix them in the release branch and test them again.
Once you have confirmed that everything works and your version is ready to go, you should
Please find more information in the LICENSE file.