Open iwpnd opened 3 years ago
(note: I'm the author of that post)
In my case I was wanting to check a specific metric num_objects
, you would be just needing the state of the follower to be "caught up".
I would suggest a standard /healthz
path, with 200 for online/ready and 500 when not.
Makes sense to me.
See PR #609.
I just pushed a new release that adds the new HEALTHZ command.
Thanks again for pushing this so fast @tidwall ! So today I got along to test this out, but failed miserably. I should've tested it locally first I guess. :)
Now I got along to test it locally with this docker-compose.yml
version: "3"
services:
tile38-leader:
image: tile38/tile38:1.24.2
container_name: tile38-leader
command: tile38-server -vv -p 9851
ports:
- 9851:9851
tile38-follower:
image: tile38/tile38:1.24.2
container_name: tile38-follower
command: >
/bin/sh -c 'mkdir -p tmp/data && \
echo "{\"follow_host\": \"tile38-leader\",\"follow_port\":9851}" > tmp/data/config
tile38-server -d tmp/data -vv -p 9852'
ports:
- 9852:9852
The logs tell me:
tile38-follower | 2021/06/09 14:11:48 [INFO] caught up
But HEALTHZ returns
// follower
curl http://localhost:9852/HEALTHZ
>> {"ok":false,"err":"not caught up","elapsed":"170.2µs"}
// leader
curl http://localhost:9851/HEALTHZ
>> {"ok":true,"elapsed":"170.2µs"}
Am I missing something here?
I just fixed the issue and pushed a new build.
tested, worked. thanks alot! 👍 🚀
You're welcome :)
Is your feature request related to a problem? Please describe. We use Tile38 in kubernetes and autoscale the followers as need be. This works like a charm, until it doesn’t.
My problem currently is the
readinessProbe
for the followers. If a follower receives traffic, before it is caught up, it errors out. It hasn’t been an issue, but the more data we process the longer the new followers take to catch up to the leader, obviously.A solution is something that is described here. But under no circumstance do I want to maintain a custom image alongside your image and execute python scripts in the
readinessProbe
.Describe the solution you'd like If I wanted to stick to the image you provide, I would have to do something like:
This will work as long as I do not screw up the url for wget.
Kubernetes also provides a
readinessProbe
withhttpGet
that would make this a little safer. If followers were to return 200 on an endpoint only if"caught_up": true
,then Kubernetes could poll that endpoint for a given period. If the error code is <400 Kubernetes would start to route request to it.Something similar can be done in
docker-compose
also with the use of dokku/wait.Do you think it would make sense to include this in Tile38 and if so, can I give it a try? :)
Edit:
/healthz
would be better, agreed @stevelacy