Closed kkimdev closed 5 years ago
@juchiast fyi
Hi @kkimdev, thank you for your suggestion. But we've already had readiness probe for tidb-server https://github.com/pingcap/tidb-operator/blob/87e8b7b227bbc3c8c83d9d849307e8ff85cfbfd1/pkg/manager/member/tidb_member_manager.go#L319-L327
@tennix hmm.. then what's the correct way to wait initialization from other pods? We tried the following init container but it didn't guarantee full initialization.
initContainers:
- name: test
image: alpine:latest
command:
- sh
- -c
- until nslookup demo-tidb; do sleep 1; done
Instead we had to check the port directly
initContainers:
- name: test
image: alpine:latest
command:
- sh
- -c
- until nc -z demo-tidb 4000; do sleep 1; done
@juchiast Please chime in if there is anything to add.
until nc -z demo-tidb 4000; do sleep 1; done
This doesn't guarantee full initialization either, because there's a separated pod to set password for TiDB. So ideally we should wait for that pod to complete. https://github.com/pingcap/tidb-operator/blob/master/charts/tidb-cluster/templates/tidb-initializer-job.yaml#L26
@kkimdev nslookup
just resolves the DNS, it doesn't check the backend readiness. For example, if you only create a service without any selected backend, the nslookup
will still succeed once the service is created. The nc
port checking works because it will try to establish a connection to the endpoints, if a pod is not ready, then it will not be added to the service endpoint.
@juchiast Yes, if you enable the initializer job, then you have to wait until the initializer job finished. The job can be enabled by specifying tidb.passwordSecretName
in values.yaml: https://github.com/pingcap/tidb-operator/blob/a3da61dfc1a0b23102f5b84ec26be6bc3a973d20/charts/tidb-cluster/values.yaml#L151
@tennix I think that's what readinessProbe is for. If pod is not fully ready, it shouldn't be published as service using readniessProbe.
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
Maybe the current readinessProbe is not implemented correctly?
P.S.: To my understanding, publishing not-initialized pod to service is incorrect. For example. Let's say there is a service and 2 running pods. Now user adds one more pod to the service, then now service routes incoming requests to 3 pods. 2 of them are ready but 1 is not. And the default service routing is round robin/random depending on proxy-mode https://kubernetes.io/docs/concepts/services-networking/service/
- Proxy-mode: userspace: ...Port and redirects that traffic to the proxy port which proxies the backend Pod. By default, the choice of backend is round robin.
- Proxy-mode: iptables: ...For each Endpoints object, it installs iptables rules which select a backend Pod. By default, the choice of backend is random.
So checking the port open wouldn't guarantee that the next connection attempt will succeed
@tennix I'm not sure if service name is available if there is no ready-pod behind. Kubernetes Init Container doc also used nslookup https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#init-containers-in-use
The readiness probe indicates whether the process is healthy or not other than the pod is dead or not. The server maybe alive but it may get busy handling request, so the server can response unhealthy to readiness probe. And then the controller removes it from the service backend, this doesn't terminate the existing connections, but disable establishing new connections. This is what readiness probe is for.
The domain name of the service doesn't care about the backend endpoint, it can be resolved once the service is created. Besides we never publish the unready pod to service, this is what the current tidb-operator does. NOTE: Creating service doesn't mean exposing the backend.
So checking the port open wouldn't guarantee that the next connection attempt will succeed
Without setting readiness probe, this is true. But the tidb-server pod has readiness probe as I've posted above, if the port is active, it means the backend pod is added to the service endpoint, and this also means the readiness probe is passed, so it's guaranteed to success for next connection unless during the probe time window the server goes dead.
@tennix I'm not sure if service name is available if there is no ready-pod behind. Kubernetes Init Container doc also used nslookup https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#init-containers-in-use
I suggest you take an experiment, creating a service but not creating the backend pod. Then do a nslookup
in one of your cluster pod.
@tennix OK will do thanks. I assumed that service is not available without ready-pod because the official Kubernetes doc used nslookup in their initContainer example.
@kkimdev Sorry, I was wrong. The domain name of a service without any endpoint will not get resolved. But the usage of the readiness probe in tidb-sever still works as what I said above.
@tennix Actually, I think I was wrong. nslookup
succeeded with the following service without any deployment.
apiVersion: v1
kind: Service
metadata:
name: my-nginx
labels:
run: my-nginx
spec:
ports:
- port: 80
protocol: TCP
selector:
run: my-nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx-init-test
spec:
selector:
matchLabels:
run: my-nginx-init-test
replicas: 1
template:
metadata:
labels:
run: my-nginx-init-test
spec:
containers:
- name: my-nginx-init-test
image: nginx
initContainers:
- name: my-nginx-init
image: alpine:latest
command:
- sh
- -c
- until nslookup my-nginx; do sleep 1; done;
It seems like tidb-operator readiness check queries http://service-name:10080/status and I guess it makes sense to do the same for our initContainer. But please let us know otherwise. And maybe also good to document the recommended initContainer config?
Anyways, Thanks a lot! :)
Currently tidb server could be not fully initialized even when the pod is accessible from other pods.
We can use readinessprobe to tell Kubernetes when it's fully ready. https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-readiness-probes