knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.56k stars 1.16k forks source link

knative service create timeout while instance rollout with 3 replicas #15438

Open yaswanthkumar1995 opened 3 months ago

yaswanthkumar1995 commented 3 months ago

Ask your question here:

I have deployment of knative-serving with 3 replicas of knative-serving & 3 replicas of net-kourier contollers with pod anti-affinity such as each pod on each node & pdb of 66% so always 2 pods are available.

probes for knative-serving

` livenessProbe: failureThreshold: 1 httpGet: path: /health port: probes scheme: HTTP periodSeconds: 5 successThreshold: 1 timeoutSeconds: 1 name: controller ports:

probes for kourier

` livenessProbe: failureThreshold: 3 grpc: port: 18000 service: "" periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: controller ports:

While performing instance rollout with 25% of each time out of 4 instances i am facing timeout issue while creating knative service after 10mins. why doesn't it using other pods which are healthy & running to route the request & not retrying on the backend with healthy pods.

kn service create hello-23 --image=docker.io/johnralston/helloworld --env TARGET="test run" Warning: Kubernetes default value is insecure, Knative may default this to secure in a future release: spec.template.spec.containers[0].securityContext.allowPrivilegeEscalation, spec.template.spec.containers[0].securityContext.capabilities, spec.template.spec.containers[0].securityContext.runAsNonRoot, spec.template.spec.containers[0].securityContext.seccompProfile Creating service 'hello-23' in namespace 'default': **Error: timeout: service 'hello-23' not ready after 600 seconds** Run 'kn --help' for usage

pods net-kourier-controller-67dc75cd58-blbk8 0/1 ContainerCreating 0 1s <none> ip-node-3.ec2.internal <none> <none> net-kourier-controller-67dc75cd58-hsdzv 1/1 Running 0 58m node-1 ip-node-1.ec2.internal <none> <none> net-kourier-controller-67dc75cd58-tzxn6 1/1 Running 0 4m43s node-2 ip-node-2.ec2.internal <none> <none>

yaswanthkumar1995 commented 3 months ago

net-kourier logs error {"severity":"ERROR","timestamp":"2024-08-01T16:51:04.670842621Z","logger":"net-kourier-controller","caller":"status/status.go:405","message":"Probing of http://hello-87-default.stage-ee-api.test-domain.com/ failed, IP: 10.208.56.218:8090, ready: false, error: error roundtripping http://hello-87-default.stage-ee-api.test-domain.com/healthz: context deadline exceeded (depth: 0)","commit":"3950a5b-dirty","knative.dev/controller":"knative.dev.net-kourier.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/traceid":"46156891-f482-4d8a-8b98-2091d65ae751","knative.dev/key":"default/hello-87","stacktrace":"knative.dev/networking/pkg/status.(*Prober).processWorkItem\n\tknative.dev/networking@v0.0.0-20240116081125-ce0738abf051/pkg/status/status.go:405\nknative.dev/networking/pkg/status.(*Prober).Start.func1\n\tknative.dev/networking@v0.0.0-20240116081125-ce0738abf051/pkg/status/status.go:290"}

github-actions[bot] commented 3 days ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.