linki / chaoskube

chaoskube periodically kills random pods in your Kubernetes cluster.
MIT License
1.81k stars 120 forks source link

[bug or feature?] pod being killed continuously #588

Open Forget-C opened 10 months ago

Forget-C commented 10 months ago

the default namespace has two pods, run go run main.go --interval 10s --namespaces 'default' --no-dry-run

image

the test pod was killed repeatedly, I think this is unfriendly in some scenarios. Is this expected behavior?

questions:

  1. i know the --minimum-age, it's use pod.ObjectMeta.CreationTimestamp check it, but it is possible that the pod will be killed again while it is still starting.
  2. --minimum-age cannot solve the problem of the same pod being killed continuously.
  3. add an option for pod being killed continuously ?
Forget-C commented 10 months ago

If you think this is a bug or feature, please assign it to me

linki commented 10 months ago

The killed Pod's name is different, so from the perspective of chaoskube I don't think it's a bug.

The second Pod shouldn't be killed before it's in the Running state (otherwise it would be a bug).

If your "test" application consists of a single Pod, then the --minimum-age setting should help to avoid killing the two Pods right after another.

Forget-C commented 10 months ago

The killed Pod's name is different, so from the perspective of chaoskube I don't think it's a bug.

The second Pod shouldn't be killed before it's in the Running state (otherwise it would be a bug).

If your "test" application consists of a single Pod, then the --minimum-age setting should help to avoid killing the two Pods right after another.

--minimum-age , it's use pod.ObjectMeta.CreationTimestamp check it. so, pods that have been created but are still being started may still be killed. I don't think killing the starting pod is the desired result.

I think it is reasonable to use such an implementation, or add a user-oriented parameter to determine whether to kill the starting pod.

if  minimum-age && pod.status == Running {

}
linki commented 10 months ago

Looking at the code Pods that are not in Running state are filtered out early on (before even checking for minimum age).

What can happen is that if you set minimum age to 5 minutes and the Pod itself stays 5 minutes in "Pending" or "Initializing" state, it can get killed right after it switches to "Running". (Because CreationTimestamp is the time the Pod object was created initially.)

Forget-C commented 10 months ago

I believe we understand each other. It was my negligence that I didn't notice the "running" status judgment. As you said "it can get killed right after it switches to "Running".", should we avoid this situation?

linki commented 10 months ago

We should think about it.

If a Pod only gets into the Running state after 5 minutes of initialization and the --minimum-age is set to 2 minutes (for example) then the earliest moment it can be killed should be 7 minutes after the initial creation.

But it might be difficult to implement. Looking at the CreationTimestamp was easy. Using the time a Pod switched to the Running state as the starting point for "minimum age" probably requires to look at the Kubernetes events since there's no such field on the Pod object itself.

However, the current implementation works for most of the cases in real-world clusters that run many Pods. We don't do it currently, but termination during the initialization phase can also be preferable for some users to uncover additional edge cases.