Closed kkwekkeboom closed 3 years ago
Hi @kkwekkeboom what ReadinessProbe
(if any) do you have set on your ksvc? Have you tried setting up a ReadinessProbe with a successThreshold
greater than 0, and if so did/does this help?
Using a readinessprobe does not make a difference
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-python8
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/maxScale: "1"
autoscaling.knative.dev/minScale: "1"
spec:
containers:
- image: dev.local/hello-world-vcrash
imagePullPolicy: IfNotPresent
env:
- name: TARGET
value: "Python Sample v1"
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 10
successThreshold: 3
failureThreshold: 1
Upgrading to v0.19 does not make a difference
Our default readiness probe is fairly "dumb" in that it only asserts a TCP connection can be established, which you have found can cause races in cases like yours.
If you specify a readiness probe that actually checks the server responds correctly, you should get the desired behavior.
Thanks
readinessProbe:
httpGet:
path: /status
initialDelaySeconds: 0
periodSeconds: 1
timeoutSeconds: 10
successThreshold: 1
failureThreshold: 1
works quite well
In what area(s)?
/area autoscale
What version of Knative?
0.17.1
Expected Behavior
When creating a revision with a crashing user-container the revision should not become ready.
Actual Behavior
The revision sometimes becomes ready, sometimes not.
Steps to Reproduce the Problem
Use https://knative.dev/docs/serving/samples/hello-world/helloworld-python/ and edit app.py such that app.py has invalid python syntax. Deploy the service with
8/10 revisions get ready, 2/10
RevisionMissing
My suspicion is that overly aggressive probing of the user container leads to the false conclusion the service is ready. In fact gunicorn is listening on port 8080 for a little moment before crashing