linki / chaoskube

chaoskube periodically kills random pods in your Kubernetes cluster.
MIT License
1.8k stars 120 forks source link

Does chaoskube really kill the pods? #103

Open ljanatka opened 6 years ago

ljanatka commented 6 years ago

Hi Martin,

I am currently working on a project where we are trying to improve reliability of our software via using chaos engineering (but, unfortunately, have a very little experience with it). Currently, our software runs on Azure/Kubernetes.

We found chaoskube as a promising tool to help us, but we found out, that it's behavior is different than expected. In the description of chaoskube, there is an information that it kills the pods, so I created a hypothesis about what will happen when one of our pods will just be dealing with a request when it is killed (there should be an error response and next requests should be processed by the other pod). When I started the experiment, the pods were killed but no error occured.

Then one of my colleagues looked in the source code of chaoskube and found out, that the pod is not killed (i.e. force killed instantly), but rather terminated (if I got it correctly, then by using this approach, the pod finishes dealing with it's current task and then "dies" peacefuly).

Is this really how chaoskube works?

We are learning more about chaos every day, but there is a lot of knowledge that we need to gain.

Since my hypothesis was probably wrong, I would be really graceful for any advice about what other chaos experiments is chaoskube suitable for.

Thank You,

Ladislav

palmerabollo commented 6 years ago

This is a very good question. I also assumed that chaoskube was killing the pods. I think killing a pod instead of terminating it is be the best option, because "graceful shutdowns" rarely happen on production environments :)

Would it be possible to at least include a flag to choose the behaviour you want (kill vs terminate)? I'm thinking about adding a configurable gracePeriod in the call to delete the pod. Sounds good?

linki commented 6 years ago

@ljanatka @palmerabollo I agree. There's already a pull request for it by @jakewins: https://github.com/linki/chaoskube/pull/104. It would help me a lot if you would also have a look and leave some feedback.

ljanatka commented 5 years ago

@linki the #104 pull request seems to be marked as failing in CI build ...

linki commented 5 years ago

@ljanatka I just fixed it in case you want to give it a try again.

ljanatka commented 5 years ago

@linki Hi, we finally got to give it a try. As far as I know, it works quite well. The pod gets killed from the inside, the cluster detects this and restarts it (restart counter of given pod increases, new instance of the pod is not being created).

linki commented 5 years ago

@ljanatka Thanks for checking it out!

ljanatka commented 5 years ago

@linki Hi, was my test enough to merge this "hardkill" feature into new version of chaoskube? When do You expect the new version to be released? Thanks!

linki commented 5 years ago

@ljanatka I'm not sure. I want to refactor it a bit before merging and I have a work-in-progress branch for it.

@jakewins has a fork of chaoskube where this is merged. You could try using it in the meantime.

ljanatka commented 5 years ago

Hi @linki

from the release notes it seems that chaoskube now can "hardkill" the pods. However I did not find any switch that would activate this feature. Or is the hardkill now implemented as default kill method?

Thanks!

linki commented 5 years ago

Hi @ljanatka,

https://github.com/linki/chaoskube/releases/tag/v0.12.1 extracted the current strategy into a separate object behind an interface in order to make it easier to add more ways to terminate a pods.

The actual "termination-by-kill" termination strategy from the original PR hasn't been ported over yet.

dbsanfte commented 3 years ago

It's been quite awhile now since this feature was requested and I see some refactoring was done. Is there any chance this could be looked at again soon?