keptn-sandbox / litmus-service

Integration for LitmusChaos
Apache License 2.0
7 stars 5 forks source link

Clean of resources causes errors #5

Closed jetzlstorfer closed 3 years ago

jetzlstorfer commented 4 years ago

Cleanup of the experiment is throwing errors.

The errors are thrown in the TestFinished Event Handler as we are going to delete some resources there.

2020-10-01T12:20:45.264698332Z 2020/10/01 12:20:45 Deleting chaos experiment resources
2020-10-01T12:20:45.352877445Z 2020/10/01 12:20:45 Error execute kubectl delete command: Error executing command kubectl delete -f litmus/experiment.yaml: exit status 1
2020-10-01T12:20:45.352914192Z Error from server (NotFound): error when deleting "litmus/experiment.yaml": chaosengines.litmuschaos.io "carts-chaos" not found

We need to investigate why the error is thrown and fix it

ksatchit commented 4 years ago

Refer: https://github.com/keptn-sandbox/litmus-service/issues/2#issuecomment-702098896

Right now, the testsFinishedEventHandler is invoked twice - once by the chaos-test followed by the jmeter test.

The ChaosEngine is already removed by the time the jmeter sends the testFinishedEvent, leading to the described error log. This can be seen in the litmus-service pod logs:

Completion of chaos experiment:

2020/10/07 09:43:10 Chaos experiment is completed
2020/10/07 09:43:10 ChaosExperiment Verdict: Pass
2020/10/07 09:43:10 Final Result: pass

Handle TestFinishedEvent sent by Litmus service

2020/10/07 09:43:10 gotEvent(sh.keptn.events.tests-finished): e204ed5e-e6fc-44e3-8825-622fac89294a - df64d289-c733-489b-be1f-3f970aaf1ecc
2020/10/07 09:43:10 Processing Test Finished Event
2020/10/07 09:43:10 Handling Tests Finished Event: df64d289-c733-489b-be1f-3f970aaf1ecc
2020/10/07 09:43:10 Deleting chaos experiment resources

Handle TestFinishedEvent sent by Jmeter service

Jmeter service log
{"timestamp":"2020-10-07T09:54:23.647352508Z","logLevel":"DEBUG","message":"Successfully executed JMeter test. Project: litmus, Service: carts, Stage: chaos, TestStra
tegy: performance"}
{"timestamp":"2020-10-07T09:54:23.647449482Z","logLevel":"INFO","message":"Tests for performance with status = true.Project: litmus, Service: carts, Stage: chaos, Tes
tStrategy: performance"}
Litmus service log
2020/10/07 09:54:23 gotEvent(sh.keptn.events.tests-finished): e204ed5e-e6fc-44e3-8825-622fac89294a - 9cf4d689-4c7f-4172-8430-b49c9edddaab
2020/10/07 09:54:23 Processing Test Finished Event
2020/10/07 09:54:23 Handling Tests Finished Event: 9cf4d689-4c7f-4172-8430-b49c9edddaab
2020/10/07 09:54:23 Deleting chaos experiment resources
2020/10/07 09:54:24 Error execute kubectl delete command: Error executing command kubectl delete -f litmus/experiment.yaml: exit status 1
Error from server (NotFound): error when deleting "litmus/experiment.yaml": chaosengines.litmuschaos.io "carts-chaos" not found
ksatchit commented 4 years ago

Current thoughts and direction regarding handling of testFinished events

Having agreed upon the above, the options we have, include:

Current Choice

(There was another option discussed which we haven't elaborated on above: Ignoring the event if is generated from litmus-service, while handling the rest. However, this is more or less equivalent to not generating the testFinishedevent at all after chaos. Guess we may not want to take this direction. So, we might go w/ (a) for the time being)

ksatchit commented 3 years ago

We took the ENV approach initially ("SEND_TEST_FINISHED_EVENT" set to "false" to prevent removal of chaosengine from the litmus-service (and react to the test finished event of jmeter alone)

However, this flow has since been modified with the refactor carried out to support Keptn 0.8.0 wherein we ignore testFinishedEvents from the litmus-service, thereby retaining the chaosengine for a deferred removal upon jmeter/other test completion.

However, if the testFinishedEvent comes in much earlier than the chaos experiment ends, in which case the results are shown "Aborted", we send out a Warning.