Open ljanatka opened 6 years ago
This is a very good question. I also assumed that chaoskube was killing the pods. I think killing a pod instead of terminating it is be the best option, because "graceful shutdowns" rarely happen on production environments :)
Would it be possible to at least include a flag to choose the behaviour you want (kill vs terminate)? I'm thinking about adding a configurable gracePeriod
in the call to delete the pod. Sounds good?
@ljanatka @palmerabollo I agree. There's already a pull request for it by @jakewins: https://github.com/linki/chaoskube/pull/104. It would help me a lot if you would also have a look and leave some feedback.
@linki the #104 pull request seems to be marked as failing in CI build ...
@ljanatka I just fixed it in case you want to give it a try again.
@linki Hi, we finally got to give it a try. As far as I know, it works quite well. The pod gets killed from the inside, the cluster detects this and restarts it (restart counter of given pod increases, new instance of the pod is not being created).
@ljanatka Thanks for checking it out!
@linki Hi, was my test enough to merge this "hardkill" feature into new version of chaoskube? When do You expect the new version to be released? Thanks!
@ljanatka I'm not sure. I want to refactor it a bit before merging and I have a work-in-progress branch for it.
@jakewins has a fork of chaoskube where this is merged. You could try using it in the meantime.
Hi @linki
from the release notes it seems that chaoskube now can "hardkill" the pods. However I did not find any switch that would activate this feature. Or is the hardkill now implemented as default kill method?
Thanks!
Hi @ljanatka,
https://github.com/linki/chaoskube/releases/tag/v0.12.1 extracted the current strategy into a separate object behind an interface in order to make it easier to add more ways to terminate a pods.
The actual "termination-by-kill" termination strategy from the original PR hasn't been ported over yet.
It's been quite awhile now since this feature was requested and I see some refactoring was done. Is there any chance this could be looked at again soon?
Hi Martin,
I am currently working on a project where we are trying to improve reliability of our software via using chaos engineering (but, unfortunately, have a very little experience with it). Currently, our software runs on Azure/Kubernetes.
We found chaoskube as a promising tool to help us, but we found out, that it's behavior is different than expected. In the description of chaoskube, there is an information that it kills the pods, so I created a hypothesis about what will happen when one of our pods will just be dealing with a request when it is killed (there should be an error response and next requests should be processed by the other pod). When I started the experiment, the pods were killed but no error occured.
Then one of my colleagues looked in the source code of chaoskube and found out, that the pod is not killed (i.e. force killed instantly), but rather terminated (if I got it correctly, then by using this approach, the pod finishes dealing with it's current task and then "dies" peacefuly).
Is this really how chaoskube works?
We are learning more about chaos every day, but there is a lot of knowledge that we need to gain.
Since my hypothesis was probably wrong, I would be really graceful for any advice about what other chaos experiments is chaoskube suitable for.
Thank You,
Ladislav