knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.55k stars 1.16k forks source link

How to set custom terminationGracePeriodSeconds for a knative Service pod #15555

Open sebastianjohnk opened 1 week ago

sebastianjohnk commented 1 week ago

Ask your question here:

Hi. I'm working on a small POC to create some knative Services. The image I'm providing for the pod currently contains a small flask app that listens on a port. Right now I'm testing out the scale-down-ability of these Services. It seems that once Knative decides to scale down the number of replicas of a pod from, say 3 to 2, or even to 0, these pods remain in a "Terminating" state for a long time, close to 4 or 5 minutes I'd say. And they seem to be in a 1/2 state. I checked the container logs. It seems the queue-proxy container is shutting down properly, but not my flask app container.

But anyway, I learned that every pod has a terminationGracePeriodSeconds value that decides how long a pod can stay in this "Terminating" stage before kubernetes force kills it.

Now here is the problem. The terminationGracePeriodSeconds seems to be a default value of 300 for all pods spawned as part of a Service, with seemingly no option to specify it in the Service yaml spec.

I'm able to specify this in a Pod yaml spec and deploy that pod individually and it gets reflected in the pod (when I fetch the pod yaml using kubectl).

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
    - name: my-container
      image: nginx
      ports:
        - containerPort: 80
  terminationGracePeriodSeconds: 120

But when I try to deploy a Service using a yaml spec, which in turn contains a pod spec with the same configuration, something like this --

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello-world
  namespace: default
  labels:
    app: hello-world
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1" # Minimum number of Pods
        autoscaling.knative.dev/maxScale: "5" # Maximum number of Pods
    spec:
      terminationGracePeriodSeconds: 99 # Custom termination grace period
      containers:
        - image: gcr.io/knative-samples/helloworld
          ports:
            - containerPort: 8080

I get an error saying Error from server (BadRequest): error when creating "helloservice.yaml": Service in version "v1" cannot be handled as a Service: strict decoding error: unknown field "spec.template.spec.terminationGracePeriodSeconds"

But if I remove this field from the Service yaml, the service gets deployed, and the pod seems to have a default terminationGracePeriodSeconds value of 300.

I also checked what the default value for a pod that is directly deployed from a pod yaml spec is (without the terminationGracePeriodSeconds specified ), to see if it a kubernetes default thing, but it seems to be 30. So the default seems to be 30 for individual pods and 300 for pods that are part of a Service.

I guess my question is, how is this default terminationGracePeriodSeconds value of 300 being set for pods belonging to Services and is there any way I can change this either by mentioning it in my Service yaml spec, or by changing some kubernetes/knative configuration ?

Any help would be much appreciated thank you.

sebastianjohnk commented 1 week ago

Update

It looks like the terminationGracePeriodSeconds value is being directly picked from the revision-timeout-seconds value specified in the config-defaults configmap in the knative namespace.

Is there any way I can have different value for these two ? Because I don't want my pod to be stuck in Terminating state for more than 25 seconds. But I might still have requests coming in to my pod that take longer than 25 seconds at which point I don't want a timeout error happening.

skonto commented 1 week ago

Hi @sebastianjohnk, Knative manages the pod termination cycle as it: a) sets a preStop hook for draining connections and manage inflight requests. This hook will query a queue proxy endpoint to check if drainer has finished. The drainer (run by queue proxy) has a waiting period of 30secs before it returns assuming no new requests have arrived. Any new request resets the timer. b) sets the terminationGracePeriodSeconds=rev.Spec.TimeoutSeconds so that requests have enough time to finish and be treated equally as any other request. If you don't set that field in the ksvc spec, the value is set from the defaults cm.