Open Mistobaan opened 6 years ago
I just noticed in the startup logs that /health
is enabled
What can I do to help?
Just chiming in - I run the zipkin collector in a Kubernetes cluster and I'm using the /health
endpoint for liveness and readiness probes without issue. It would be nice if the endpoint were documented.
exactly, the endpoint is there but is not documented. I changed the title of the issue.
so the health endpoint is mentioned here, but yeah not well documented. We also use this for the HEALTHCHECK directive in docker. I'll move this issue to the main repo, noting that there is an emerging https://github.com/openzipkin/zipkin-helm which should master the info on k8s stuff
https://github.com/openzipkin/zipkin/tree/master/zipkin-server#endpoints
So, in summary, probably we should coalesce on a practice before documenting one, but gut feel is adapting from our Dockerfile one is not a bad start. We probably need some advice to clarify lack of startup and liveness probes in the ecosystem, e.g. if that's a feature or a bug. cc @mfordjody @optional303
To begin this, /health is a composite status of the heath of zipkin's dependencies. For example, if zipkin is configured for stackdriver or kafka and either connection don't work, /health will return non 200 code.
here is the text on out HEALTHCHECK in docker, which ack is disabled for k8s, but is the same basic info
# We use start period of 30s to avoid marking the container unhealthy on slow or contended CI hosts.
#
# If in production, you have a 30s startup, please report to https://gitter.im/openzipkin/zipkin
# including the values of the /health and /info endpoints as this would be unexpected.
HEALTHCHECK --interval=5s --start-period=30s --timeout=5s CMD ["docker-healthcheck"]
https://github.com/openzipkin/zipkin/blob/master/docker/Dockerfile#L66-L70
A common pattern for liveness probes is to use the same low-cost HTTP endpoint as for readiness probes, but with a higher failureThreshold.
The incubating zipkin-helm chart in this org seems to have defined readiness, but not liveness, from docker HEALTHCHECK, notably missing the start period
readinessProbe:
httpGet:
path: /health
port: 9411
initialDelaySeconds: 5
periodSeconds: 5
The setup above is consistent with a few other helm charts including https://github.com/radius-project/radius/blob/21b25ddf265f0464e4641b8c79cff61a4f9badd0/deploy/monitoring/zipkin-mem.yaml#L19 and https://github.com/apache/dubbo-kubernetes/blob/c4f2898e4eacd978c780bec79989a465f3e5a9dd/deploy/kubernetes/zipkin.yaml#L91-L96
That said, Financial Times is using a socket for one and /health for the other, here, defaulting the startup delay to 200
livenessProbe:
initialDelaySeconds: {{ .Values.ui.probeStartupDelay }}
tcpSocket:
port: {{ .Values.ui.queryPort }}
readinessProbe:
initialDelaySeconds: {{ .Values.ui.probeStartupDelay }}
httpGet:
path: /health
port: {{ .Values.ui.queryPort }}
While spring-boot usage is an internal detail (so we wouldn't use their mappings or rely on how they do things like via an event bus which is TMI), that boot explicitly uses different HTTP paths for liveness and readiness is interesting and useful research https://spring.io/blog/2020/03/25/liveness-and-readiness-probes-with-spring-boot
I am trying to deploy the docker image gcr.io/stackdriver-trace-docker/zipkin-collector inside kubernetes cluster. I was wondering what would be the best endpoint to have the liveProve / readyProbe functionality for a kubernetes pod.