Open patrik93 opened 1 year ago
Can you post your Knative Service yaml?
Thanks for helping me out!
I have removed a few blocks such like affinity, and volumes for the sake of simplicity:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: service
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/maxScale: "5"
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/target: "3"
spec:
containers:
- image: nginx
imagePullPolicy: Always
command:
- /bin/bash
args:
- "-c"
- "sleep 40; /docker-entrypoint.sh nginx -g 'daemon off;'"
ports:
- containerPort: 80
readinessProbe:
failureThreshold: 1
httpGet:
path: /
scheme: HTTP
initialDelaySeconds: 40
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: "1"
memory: 2Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
Turning on request (& probe) logging it shows that the activator
component is behind these probes. It triggers the queue proxy to perform the Pod's readiness check. The activator performs these probes to detect ready Pods faster than K8s control plane - the idea here is to reduce latency/cold-start when scaling from zero.
In your scenario a new Revision won't become Ready=True
until the Deployment & Pod becomes Ready=True
and that has a floor of 40s (initialDelaySeconds). So there isn't any value in doing this aggressive probing because traffic won't be cut over until the Revision ready.
Right now the activator probing isn't flexible and it's pretty dumb - but improvements can be made.
/triage accepted
Note: I updated your YAML with a working example using the nginx
image.
@dprotaso I was thinking that if user specifies some readiness delay we could detect that when we create the revision watcher here: https://github.com/knative/serving/blob/main/pkg/activator/net/revision_backends.go#L530 and then start after an initial delay the probe timer. WDYTH?
FYI In the past - the activator probe didn't trigger the queue proxy to perform a probe. Instead it would return the last known result of the queue-proxy probe.
I've been looking into this.
Regarding @dprotaso's comment:
In the past - the activator probe didn't trigger the queue proxy to perform a probe. Instead it would return the last known result of the queue-proxy probe.
I think this is still kinda true. The way I'm interpreting this probe code is that, it stores the last known probe, so that if a request to probe comes in the middle of an already in-progress probe, it returns the last value.
So, technically, it does return the last known value.
However, if the activator is probing every 100ms, and a single probe takes <100ms to complete, it is likely gonna be triggering a probe every 100ms anyway.
Based on that understanding, I think was @skonto suggested above seems like a nice way to address this issue in particular, where we know the app isn't going to start up right away.
I'm gonna start there in trying to address this, and if we want any more intelligent probing behavior, I can either add it in later, or start a new Issue/PR.
I think this is still kinda true. The way I'm interpreting this probe code is that, it stores the last known probe, so that if a request to probe comes in the middle of an already in-progress probe, it returns the last value.
FYI - that code ensures if there are multiple concurrent probe requests then only a single request is made to the user container. Latter requests will wait for the result of the first request.
I think this still doesn't alleviate that the activator will trigger probers to the user container via the queue proxy - when prior it would return the last cached result.
@dprotaso I was thinking that if user specifies some readiness delay we could detect that when we create the revision watcher here: https://github.com/knative/serving/blob/main/pkg/activator/net/revision_backends.go#L530 and then start after an initial delay the probe timer. WDYTH?
Thinking about this a bit I don't think it will work. The activator discovers pods through endpoint updates. Getting these updates is actually pretty slow and delayed. I've seen the activator just 'discover' a pod who's endpoint is marked not-ready but the pod is actually 'ready'.
Initial delay is really for the container start. This gets more complicated if you have different containers with different delays. Because of the endpoint propagation being slow - waiting for the initial delay in the activator could mean we're waiting longer than we should.
Because of the endpoint propagation being slow - waiting for the initial delay in the activator could mean we're waiting longer than we should.
hmm okay that is a good point. When I was trying it out myself, I found that the activator delay usually finished before the user container did, so I didn't think about it taking longer and further delaying readiness.
okay, so if I'm following, perhaps a more fitting option here is to:
maybe 1 is enough, but I wanted to include 2 since that is what this issue started from.
/unassign @KauzClay
Ask your question here:
Hi there, When a service starts up, in our case we need a few seconds for the application to be ready. There are two challenges we are facing: 1.
2.
Is there any way to setup queue-proxy
readinessProbes
to be a bit more patient? I mean something like theinitialDelaySeconds
with the services.Thanks in advance!