split liveness and readiness probes

slimm609 commented 1 year ago

split out liveness and readiness probes. The liveness will always respond when the application is running but the readiness will not respond when a large sync is in progress.

fixes #215

slimm609 commented 1 year ago

rebased with the 1.18 updated

slimm609 commented 1 year ago

@martin-helmich this should be good now

cherrera-acx commented 1 year ago

@martin-helmich - Will this be added to a future release, or is this change already available in v2.7.3?

timhavens commented 1 year ago

:(

I don't know if this is related, but I think it is, because the issue coincides with this change.

I think most likely it's because i'm using the 'latest' image and the yaml changed in the master branch for an endpoint that isn't hosted by the 'latest' image (yet) - doh on my part I suppose. Still, I was following the 'Manual' install in your README.md and this occurred. Maybe that needs to be updated if the latest image is still 2.7.3. It's a simple tweak to get it working like was, but it did take me a while to finally realize what was happening.

I've been using a manual install process for quite a while, and about 2 days ago I noticed Replicator's Ready == 0/1, and since then I've been trying to debug it. It appears that replicator is still actually replicating things successfully, but the readiness state never gets to 1/1 anymore.

$ # Create roles and service accounts $ kubectl apply -f https://raw.githubusercontent.com/mittwald/kubernetes-replicator/master/deploy/rbac.yaml $ # Create actual deployment $ kubectl apply -f https://raw.githubusercontent.com/mittwald/kubernetes-replicator/master/deploy/deployment.yaml

I run this on an AWS EKS cluster. The pod has an event where it's reporting Readiness as a 404 response. I've confirmed that using port forwarding and hitting the endpoint for it.

http://localhost:9102/readyz 404 page not found http://localhost:9102/healthz { "notReady": [] }

I also noticed in the EKS logs this entry:

{ "kind": "Event", "apiVersion": "audit.k8s.io/v1", "level": "Request", "auditID": "xxxx", "stage": "ResponseComplete", "requestURI": "/apis/rbac.authorization.k8s.io/v1/clusterroles/replicator-kubernetes-replicator", "verb": "get", "user": { "username": "kubernetes-admin", "uid": "aws-iam-authenticator:xxxx:xxxx", "groups": [ "system:masters", "system:authenticated" ], "extra": { "accessKeyId": [ "xxxx" ], "arn": [ "arn:aws:iam::xxxx:user/xxxx" ], "canonicalArn": [ "arn:aws:iam::xxxx:user/xxxx" ], "sessionName": [ "" ] } }, "sourceIPs": [ "x.x.x.x" ], "userAgent": "kubectl/v1.24.3 (linux/amd64) kubernetes/aef86a9", "objectRef": { "resource": "clusterroles", "name": "replicator-kubernetes-replicator", "apiGroup": "rbac.authorization.k8s.io", "apiVersion": "v1" }, "responseStatus": { "metadata": {}, "status": "Failure", "message": "clusterroles.rbac.authorization.k8s.io \"replicator-kubernetes-replicator\" not found", "reason": "NotFound", "details": { "name": "replicator-kubernetes-replicator", "group": "rbac.authorization.k8s.io", "kind": "clusterroles" }, "code": 404 }, "requestReceivedTimestamp": "2023-03-30T18:47:57.369815Z", "stageTimestamp": "2023-03-30T18:47:57.374817Z", "annotations": { "authorization.k8s.io/decision": "allow", "authorization.k8s.io/reason": "" } }

mittwald / kubernetes-replicator

split liveness and readiness probes #247