Closed gd closed 2 years ago
Notes:
A liveness probe indicates that the container is running. a readiness probe indicates that the container is ready to service requests. ie. liveness probe indicates the state of the container and the readiness probe indicates the state of the service running in that container. A container with liveness probe set to true and readiness probe returning false indicates that the container is up and running but the service is not yet ready to service requests.
by default, a liveness probe will check PID 1 in the container to determine if the the container is alive. This is fine for cases where only one process runs on the container.
by default, kubernetes will assume that the container is ready to receive traffic as long at the liveliness probe returns true.
# Lookup existing Probes
$ kubectl edit sts smbshare3
..
..
# No probes defined for ctdb container
- args:
- run
- ctdbd
- --setup=smb_ctdb
- --setup=ctdb_config
- --setup=ctdb_etc
- --setup=ctdb_nodes
env:
- name: SAMBA_CONTAINER_ID
value: smbshare3
- name: SAMBACC_CONFIG
value: /etc/container-config/config.json
- name: HOSTNAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: SAMBACC_CTDB
value: ctdb-is-experimental
image: quay.io/samba.org/samba-server:latest
imagePullPolicy: Always
name: ctdb
resources: {}
..
# Both liveness and readiness probe defined for smbd container
- args:
- run
- smbd
- --setup=users
- --setup=smb_ctdb
env:
- name: SAMBA_CONTAINER_ID
value: smbshare3
- name: SAMBACC_CONFIG
value: /etc/container-config/config.json
image: quay.io/samba.org/samba-server:latest
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 445
timeoutSeconds: 1
name: samba
ports:
- containerPort: 445
name: smb
protocol: TCP
readinessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 445
timeoutSeconds: 1
# Login into the ctdb container in a clustered pod created
[sprabhu@fedora bin]$ kubectl exec -it smbshare3-0 -c ctdb -- /bin/bash
# Process list on the ctdb share.
[root@smbshare3-0 /]# ps -fax
PID TTY STAT TIME COMMAND
533 pts/0 Ss 0:00 /bin/bash
635 pts/0 R+ 0:00 \_ ps -fax
89 ? Ss 0:00 /usr/sbin/smbd --foreground --log-stdout --no-process-group
105 ? S 0:00 \_ /usr/sbin/smbd --foreground --log-stdout --no-process-group
106 ? S 0:00 \_ /usr/sbin/smbd --foreground --log-stdout --no-process-group
83 ? Ss 0:00 /usr/bin/python3 /usr/local/bin/samba-container ctdb-manage-nodes --hostname=smbshare3-0 --take-node-number-from-hostname=after-last-dash
39 ? SLs 0:03 /usr/sbin/ctdbd --interactive
45 ? S 0:00 \_ /usr/libexec/ctdb/ctdb-eventd -P 39 -S 9
81 ? S 0:00 \_ /usr/sbin/ctdbd --interactive
94 ? S 0:00 \_ /usr/libexec/ctdb/ctdb_mutex_fcntl_helper /var/lib/ctdb/shared/RECOVERY
1 ? Ss 0:00 /pause
References:
Set readinessProbe in the following manner for the ctdb container
readinessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
exec:
command:
- /bin/sh
- -c
- "ctdb nodestatus |grep 'OK (THIS NODE)'"
Exec into a pod/ctdb container and disable the ctdb service
[sprabhu@fedora bin]$ kubectl exec -it smbshare3-0 -c ctdb -- /bin/bash
[root@smbshare3-0 /]# ctdb nodestatus
pnn:0 10.244.1.37 OK (THIS NODE)
[root@smbshare3-0 /]# ctdb disable
[root@smbshare3-0 /]# ctdb nodestatus
pnn:0 10.244.1.37 DISABLED (THIS NODE)
We see the following effect in the cluster
[sprabhu@fedora tests]$ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
samba-ad-server-86b7dd9856-m46sh 1/1 Running 0 43h
smbshare3-0 3/3 Running 0 28m
smbshare3-1 3/3 Running 0 31m
smbshare3-0 2/3 Running 0 28m
smbshare3-0 goes from Ready 3/3 to 2/3
[sprabhu@fedora tests]$ kubectl describe pod smbshare3-0
..
Warning Unhealthy 43s (x120 over 45m) kubelet Readiness probe failed:
At this point, the smbshare service should have stopped sending any service requests to the pod. However it doesn't reboot the pod automatically. This requires the liveliness probe to be setup instead.
From the ctdb man page. The status can be any of the following
OK
This node is healthy and fully functional. It hosts public addresses to provide services.
DISCONNECTED
This node is not reachable by other nodes via the private network. It is not currently participating in the cluster. It does not host public
addresses to provide services. It might be shut down.
DISABLED
This node has been administratively disabled. This node is partially functional and participates in the cluster. However, it does not host
public addresses to provide services.
UNHEALTHY
A service provided by this node has failed a health check and should be investigated. This node is partially functional and participates in
the cluster. However, it does not host public addresses to provide services. Unhealthy nodes should be investigated and may require an
administrative action to rectify.
BANNED
CTDB is not behaving as designed on this node. For example, it may have failed too many recovery attempts. Such nodes are banned from
participating in the cluster for a configurable time period before they attempt to rejoin the cluster. A banned node does not host public
addresses to provide services. All banned nodes should be investigated and may require an administrative action to rectify.
STOPPED
This node has been administratively exclude from the cluster. A stopped node does no participate in the cluster and does not host public
addresses to provide services. This state can be used while performing maintenance on a node.
PARTIALLYONLINE
A node that is partially online participates in a cluster like a healthy (OK) node. Some interfaces to serve public addresses are down, but at
least one interface is up. See also ctdb ifaces.
Sachin wants to do research about this (will get in touch with John about it).