Closed deimosfr closed 6 years ago
Are your pod showing as 0/1 in kubectl get pods? (or something like 2/3 if you use multi containers pod)
Usually this is caused by having a readiness check on a pod that is slow to boot. We're having this issue as well (our monolithic can take like 5 min to boot) It's on my radar to search for a solution to this but it could be around the second half of the year as we're busy with upcoming product launches.
If your container is showing as 1/1 and you still get ContainersNotReady then it's probably a bug. Please comment with details if this is the case.
I think I know the problem. Should be a quick fix. I'll try testing it.
Hi,
I confirm I get 1/1 or 2/2 or 3/3. It happens while booting. The readiness check is always working, I tried to delay it as max as possible but this get the message in the early boot. It's like if the Container Not Ready was catch before the readiness check happened
I added ContainersNotReady to the blacklist (plus Longnotready bugfix). Could you please test that the patch is working? The image is tagged latest on Docker Hub.
good for me ! thanks
Released as v3.2.3
Sorry but I still got the issue
https://github.com/wongnai/kube-slack/blob/master/src/monitors/waitingpods.js#L11 ContainersNotReady is blacklisted. Could you please give
I'm going to test with v3.2.3, I tested with latest when you asked for test. I'll keep you updated
Hi,
I confirm I still got the issue:
$ kubectl describe pod/kube-slack-78b78b89cf-jh9dx
Name: kube-slack-78b78b89cf-jh9dx
Namespace: kube-system
Node: node6/1.1.1.1
Start Time: Sun, 11 Mar 2018 18:57:01 +0100
Labels: app=kube-slack
pod-template-hash=3463464579
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kube-slack-78b78b89cf","uid":"5c5727a2-2555-11e8-9b5d-0007cb...
scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.233.68.25
Controlled By: ReplicaSet/kube-slack-78b78b89cf
Containers:
kube-slack:
Container ID: docker://ab1f8a71bc1e303327e4c8edfd38a0bde66fbd7ce18e7e175c70d36547706ce1
Image: willwill/kube-slack:v3.2.3
Image ID: docker-pullable://willwill/kube-slack@sha256:dbfc705ba68b7079ada1e913250d76c50754ef3c211339e900b3a2dacb2c2a0b
Port: <none>
State: Running
Started: Sun, 11 Mar 2018 18:57:11 +0100
Ready: True
Restart Count: 0
Environment:
SLACK_URL: https://hooks.slack.com/services/xxx/yyy
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kg9rw (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-kg9rw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kg9rw
Optional: false
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/node=true
Tolerations: <none>
Events: <none>
My message when deleting a pod from a daemonset for example:
containers with unready status: [traefik]
kube-system/traefik-2qzq7: ContainersNotReady
containers with unready status: [traefik]
If you want to know more about what the daemonset looks like: https://github.com/MySocialApp/kubernetes-helm-chart-traefik/blob/master/kubernetes/templates/daemonset.yaml
Thanks
So it is indeed triggered by LongNotReady: https://github.com/wongnai/kube-slack/blob/abfb14a39a677fd0c1195d806df511eb9048e470/src/monitors/longnotready.js#L74 and not where I added the status ignore. I'll revert afbd9699d7e9e94cb495cc0f6cde5cb544f1c9d4.
I'm taking vacation next week, so I might be able to work on this around end of the month. Sorry for the wait.
Could you please try the following?
NOT_READY_MIN_TIME
environment variable to kube-slack. The default is 60000 which is 60s. In our production system we use 300000 as we have a Java application that is very slow to boot.Hi, that doesn't change any thing. Even very small software that runs in a few seconds have the same issue.
I just tested this in our cluster and I still can't reproduce:
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- image: kitematic/hello-world-nginx
name: hello-world-nginx
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /
port: 80
Hi,
Sorry for the late answer, I'm currently testing with the latest version and your suggestion on NOT_READY_MIN_TIME.
Thanks
Looks good ! Thanks
Hi,
I often have this kind of message: ContainersNotReady. Even if I add really long delay on initialDelaySeconds. What is the recommended stuff to avoid this? In addition,I do not see any strange behavhior while the pods are booting.
Thanks