knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.46k stars 1.14k forks source link

Tutorial: Knative serving + Istio + real DNS, envoy cannot reach my services (503 Service unavailable) #14948

Open dsgli opened 4 months ago

dsgli commented 4 months ago

/area networking

Hi, I have kubernetes on premises setup. Kubernetes: 1.29.1 (single node, control plane is also a worker node) Knative Version: 1.13.1 Istio Version: 1.13.0 1.20.2, (mistaken 1.13.0 was based on kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.13.0/net-istio.yaml )

I followed tutorial visible here: https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/ I did not set up mTLS. I added serving HPA accoring to guide.

My services, routes, pods in cluster are up and running. Istio gateway has external IP assigned, yet i cannot reach my services via curl, (outside of kubernetes cluser, from the same host where k8s is running).

Ask your question here:

Im pretty sure something might be missing in configuration, or setup is incorrect. I cannot trace where the problem is. Any help will be appreciated as now I dont really see any clue in logs where problem could be located.

curl -v -H "Content-Type: application/json" -H "Host: mlworkeralpha-predictor.kserve-dsml5.ml.proxy.mydomain.rnd" http://192.168.9.1/v1/models/mlworkeralpha:predict -d '{"data":{"number":7}}'

* processing: http://192.168.9.1/v1/models/mlworkeralpha:predict
*   Trying 192.168.9.1:80...
* Connected to 192.168.9.1 (192.168.9.1) port 80
POST /v1/models/mlworkeralpha:predict HTTP/1.1
Host: mlworkeralpha-predictor.kserve-dsml5.ml.proxy.mydomain.rnd
User-Agent: curl/8.2.1
Accept: */*
Content-Type: application/json
Content-Length: 21

< HTTP/1.1 503 Service Unavailable
< content-length: 152
< content-type: text/plain
< date: Mon, 26 Feb 2024 12:25:20 GMT
< server: istio-envoy
< 
* Connection #0 to host 192.168.9.1 left intact
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 113
Working pods/services/routing

NAME READY STATUS RESTARTS AGE pod/mlworkeralpha-predictor-00001-deployment-d8f69f474-xm4sv 2/2 Running 6 (42m ago) 3d NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/mlworkeralpha ExternalName knative-local-gateway.istio-system.svc.cluster.local 3h40m service/mlworkeralpha-predictor ExternalName knative-local-gateway.istio-system.svc.cluster.local 80/TCP 3d service/mlworkeralpha-predictor-00001 ClusterIP 10.103.209.198 80/TCP,443/TCP 3d service/mlworkeralpha-predictor-00001-private ClusterIP 10.98.6.87 80/TCP,443/TCP,9090/TCP,9091/TCP,8022/TCP,8012/TCP 3d NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mlworkeralpha-predictor-00001-deployment 1/1 1 1 3d NAME DESIRED CURRENT READY AGE replicaset.apps/mlworkeralpha-predictor-00001-deployment-d8f69f474 1 1 1 3d NAME LATESTCREATED LATESTREADY READY REASON configuration.serving.knative.dev/mlworkeralpha-predictor mlworkeralpha-predictor-00001 mlworkeralpha-predictor-00001 True NAME CONFIG NAME K8S SERVICE NAME GENERATION READY REASON ACTUAL REPLICAS DESIRED REPLICAS revision.serving.knative.dev/mlworkeralpha-predictor-00001 mlworkeralpha-predictor 1 True 1 NAME URL READY REASON route.serving.knative.dev/mlworkeralpha-predictor http://mlworkeralpha-predictor.kserve-dsml5.ml.proxy.mydomain.rnd True NAME URL LATESTCREATED LATESTREADY READY REASON service.serving.knative.dev/mlworkeralpha-predictor http://mlworkeralpha-predictor.kserve-dsml5.ml.proxy.mydomain.rnd mlworkeralpha-predictor-00001 mlworkeralpha-predictor-00001 True

Istio external IP assigned

kubectl --namespace istio-system get service istio-ingressgateway NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE istio-ingressgateway LoadBalancer 10.105.112.176 192.168.9.1 15021:30104/TCP,80:30686/TCP,443:31206/TCP 3d1h

Knative pods running

> kubectl get pods -n knative-serving NAME READY STATUS RESTARTS AGE activator-58db57894b-jhxf4 1/1 Running 4 (57m ago) 3d1h autoscaler-76f95fff78-qtptw 1/1 Running 3 (57m ago) 3d1h autoscaler-hpa-85696784dd-lpqjj 1/1 Running 3 (57m ago) 3d1h controller-7dd875844b-4btf6 1/1 Running 3 (57m ago) 3d1h net-istio-controller-5576fc66d-g78xg 1/1 Running 3 (57m ago) 3d1h net-istio-webhook-9965c55c5-tvblf 1/1 Running 3 (57m ago) 3d1h webhook-d8674645d-rvppt 1/1 Running 3 (57m ago) 3d1h

Knative-serving configmap/config-domain

> kubectl get configmap/config-domain --namespace knative-serving -oyaml ``` apiVersion: v1 data: ml.proxy.mydomain.rnd: "" svc.cluster.local: | selector: app: secret kind: ConfigMap metadata: annotations: knative.dev/example-checksum: 26c09de5 kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","data":{"_example":"################################\n# #\n# EXAMPLE CONFIGURATION #\n# #\n################################\n\n# This block is not actually functional configuration,\n# but serves to illustrate the available configuration\n# options and document them in a way that is accessible\n# to users that `kubectl edit` this config map.\n#\n# These sample configuration options may be copied out of\n# this example block and unindented to be in the data block\n# to actually change the configuration.\n\n# Default value for domain.\n# Routes having the cluster domain suffix (by default 'svc.cluster.local')\n# will not be exposed through Ingress. You can define your own label\n# selector to assign that domain suffix to your Route here, or you can set\n# the label\n# \"networking.knative.dev/visibility=cluster-local\"\n# to achieve the same effect. This shows how to make routes having\n# the label app=secret only exposed to the local cluster.\nsvc.cluster.local: |\n selector:\n app: secret\n\n# These are example settings of domain.\n# example.com will be used for all routes, but it is the least-specific rule so it\n# will only be used if no other domain matches.\nexample.com: |\n\n# example.org will be used for routes having app=nonprofit.\nexample.org: |\n selector:\n app: nonprofit\n"},"kind":"ConfigMap","metadata":{"annotations":{"knative.dev/example-checksum":"26c09de5"},"labels":{"app.kubernetes.io/component":"controller","app.kubernetes.io/name":"knative-serving","app.kubernetes.io/version":"1.13.1"},"name":"config-domain","namespace":"knative-serving"}} creationTimestamp: "2024-02-23T10:57:05Z" labels: app.kubernetes.io/component: controller app.kubernetes.io/name: knative-serving app.kubernetes.io/version: 1.13.1 name: config-domain namespace: knative-serving resourceVersion: "5449487" uid: 4298243d-a367-43d8-8097-d669d98d7ad7 ```

Config istio, Should gateway/ingress setup be under _example?

> kubectl get cm config-istio -n knative-serving -oyaml ``` apiVersion: v1 data: _example: | gateway.knative-serving.knative-ingress-gateway: "istio-ingressgateway.istio-system.svc.cluster.local" local-gateway.knative-serving.knative-local-gateway: "knative-local-gateway.istio-system.svc.cluster.local" kind: ConfigMap metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","data":{"_example":"################################\n# #\n# EXAMPLE CONFIGURATION #\n# #\n################################\n\n# This block is not actually functional configuration,\n# but serves to illustrate the available configuration\n# options and document them in a way that is accessible\n# to users that `kubectl edit` this config map.\n#\n# These sample configuration options may be copied out of\n# this example block and unindented to be in the data block\n# to actually change the configuration.\n\n# A gateway and Istio service to serve external traffic.\n# The configuration format should be\n# `gateway.{{gateway_namespace}}.{{gateway_name}}: \"{{ingress_name}}.{{ingress_namespace}}.svc.cluster.local\"`.\n# The {{gateway_namespace}} is optional; when it is omitted, the system will search for\n# the gateway in the serving system namespace `knative-serving`\ngateway.knative-serving.knative-ingress-gateway: \"istio-ingressgateway.istio-system.svc.cluster.local\"\n\n# A cluster local gateway to allow pods outside of the mesh to access\n# Services and Routes not exposing through an ingress. If the users\n# do have a service mesh setup, this isn't required and can be removed.\n#\n# An example use case is when users want to use Istio without any\n# sidecar injection (like Knative's istio-ci-no-mesh.yaml). Since every pod\n# is outside of the service mesh in that case, a cluster-local service\n# will need to be exposed to a cluster-local gateway to be accessible.\n# The configuration format should be `local-gateway.{{local_gateway_namespace}}.\n# {{local_gateway_name}}: \"{{cluster_local_gateway_name}}.\n# {{cluster_local_gateway_namespace}}.svc.cluster.local\"`. The\n# {{local_gateway_namespace}} is optional; when it is omitted, the system\n# will search for the local gateway in the serving system namespace\n# `knative-serving`\nlocal-gateway.knative-serving.knative-local-gateway: \"knative-local-gateway.istio-system.svc.cluster.local\"\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"net-istio","app.kubernetes.io/name":"knative-serving","app.kubernetes.io/version":"1.13.0","networking.knative.dev/ingress-provider":"istio"},"name":"config-istio","namespace":"knative-serving"}} creationTimestamp: "2024-02-23T11:01:44Z" labels: app.kubernetes.io/component: net-istio app.kubernetes.io/name: knative-serving app.kubernetes.io/version: 1.13.0 networking.knative.dev/ingress-provider: istio name: config-istio namespace: knative-serving resourceVersion: "5457598" uid: 6a39eea4-2189-4b84-a529-3f1937afd555 ```

Virtual Service

> kubectl get vs mlworkeralpha -n kserve-dsml5 -oyaml ``` apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: creationTimestamp: "2024-02-26T08:20:53Z" generation: 1 name: mlworkeralpha namespace: kserve-dsml5 ownerReferences: - apiVersion: serving.kserve.io/v1beta1 blockOwnerDeletion: true controller: true kind: InferenceService name: mlworkeralpha uid: 3ffd2143-b8a3-4b18-a542-82384cb6dcfa resourceVersion: "5396810" uid: 88535176-9208-4c31-a53a-a52ec0a69cb6 spec: gateways: - knative-serving/knative-local-gateway - knative-serving/knative-ingress-gateway hosts: - mlworkeralpha.kserve-dsml5.svc.cluster.local - mlworkeralpha.kserve-dsml5.ml.proxy.mydomain.rnd http: - headers: request: set: Host: mlworkeralpha-predictor.kserve-dsml5.svc.cluster.local match: - authority: regex: ^mlworkeralpha\.kserve-dsml5(\.svc(\.cluster\.local)?)?(?::\d{1,5})?$ gateways: - knative-serving/knative-local-gateway - authority: regex: ^mlworkeralpha\.kserve-dsml5\.ml\.proxy\.mydomain\.rnd(?::\d{1,5})?$ gateways: - knative-serving/knative-ingress-gateway route: - destination: host: knative-local-gateway.istio-system.svc.cluster.local port: number: 80 weight: 100 ```

Logs:

Istio activator

``` {"severity":"ERROR","timestamp":"2024-02-26T11:21:51.952503244Z","logger":"activator","caller":"websocket/connection.go:191","message":"Failed to send ping message to ws://autoscaler.knative-serving.svc.cluster.local:8080","commit":"41769de","knative.dev/controller":"activator","knative.dev/pod":"activator-58db57894b-jhxf4","error":"connection has not yet been established","stacktrace":"knative.dev/pkg/websocket.NewDurableConnection.func3\n\tknative.dev/pkg@v0.0.0-20240116073220-b488e7be5902/websocket/connection.go:191"} {"severity":"WARNING","timestamp":"2024-02-26T11:21:54.083348086Z","logger":"activator","caller":"handler/healthz_handler.go:36","message":"Healthcheck failed: connection has not yet been established","commit":"41769de","knative.dev/controller":"activator","knative.dev/pod":"activator-58db57894b-jhxf4"} {"severity":"INFO","timestamp":"2024-02-26T11:22:05.767219773Z","logger":"activator","caller":"net/throttler.go:669","message":"Updated public Endpoints: mlworkeralpha-predictor-00001","commit":"41769de","knative.dev/controller":"activator","knative.dev/pod":"activator-58db57894b-jhxf4"} {"severity":"INFO","timestamp":"2024-02-26T11:22:05.767506479Z","logger":"activator","caller":"net/throttler.go:601","message":"Public EPS updates: &v1.Endpoints{TypeMeta:v1.TypeMeta{Kind:\"\", APIVersion:\"\"}, ObjectMeta:v1.ObjectMeta{Name:\"mlworkeralpha-predictor-00001\", GenerateName:\"\", Namespace:\"kserve-dsml5\", SelfLink:\"\", UID:\"d69add24-70bf-4c21-8aec-a29cc96db715\", ResourceVersion:\"5450658\", Generation:0, CreationTimestamp:time.Date(2024, time.February, 23, 12, 0, 39, 0, time.Local), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{\"app\":\"mlworkeralpha-predictor-00001\", \"component\":\"predictor\", \"networking.internal.knative.dev/serverlessservice\":\"mlworkeralpha-predictor-00001\", \"networking.internal.knative.dev/serviceType\":\"Public\", \"serving.knative.dev/configuration\":\"mlworkeralpha-predictor\", \"serving.knative.dev/configurationGeneration\":\"1\", \"serving.knative.dev/configurationUID\":\"54109252-ffc3-4945-89d6-8656045276ea\", \"serving.knative.dev/revision\":\"mlworkeralpha-predictor-00001\", \"serving.knative.dev/revisionUID\":\"a9523c09-d39f-4762-8072-b3c9306879f2\", \"serving.knative.dev/service\":\"mlworkeralpha-predictor\", \"serving.knative.dev/serviceUID\":\"58f4091c-311e-4326-90b6-b4808b7b7bdc\", \"serving.kserve.io/inferenceservice\":\"mlworkeralpha\"}, Annotations:map[string]string{\"autoscaling.knative.dev/class\":\"kpa.autoscaling.knative.dev\", \"autoscaling.knative.dev/min-scale\":\"1\", \"serving.knative.dev/creator\":\"system:serviceaccount:kserve:kserve-controller-manager\"}, OwnerReferences:[]v1.OwnerReference{v1.OwnerReference{APIVersion:\"networking.internal.knative.dev/v1alpha1\", Kind:\"ServerlessService\", Name:\"mlworkeralpha-predictor-00001\", UID:\"52d940e6-20c6-4266-b867-401bb2f49bee\", Controller:(*bool)(0xc000c0788d), BlockOwnerDeletion:(*bool)(0xc000c0788e)}}, Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:\"controller\", Operation:\"Update\", APIVersion:\"v1\", Time:time.Date(2024, time.February, 26, 11, 22, 5, 0, time.Local), FieldsType:\"FieldsV1\", FieldsV1:(*v1.FieldsV1)(0xc000c4a6d8), Subresource:\"\"}}}, Subsets:[]v1.EndpointSubset{v1.EndpointSubset{Addresses:[]v1.EndpointAddress{v1.EndpointAddress{IP:\"10.10.246.158\", Hostname:\"\", NodeName:(*string)(0xc000bf76d0), TargetRef:(*v1.ObjectReference)(0xc000c28930)}}, NotReadyAddresses:[]v1.EndpointAddress(nil), Ports:[]v1.EndpointPort{v1.EndpointPort{Name:\"http\", Port:8012, Protocol:\"TCP\", AppProtocol:(*string)(nil)}, v1.EndpointPort{Name:\"https\", Port:8112, Protocol:\"TCP\", AppProtocol:(*string)(nil)}}}}}","commit":"41769de","knative.dev/controller":"activator","knative.dev/pod":"activator-58db57894b-jhxf4"} {"severity":"INFO","timestamp":"2024-02-26T11:22:05.768144776Z","logger":"activator","caller":"net/throttler.go:645","message":"This activator index is 0/1 was -1/0","commit":"41769de","knative.dev/controller":"activator","knative.dev/pod":"activator-58db57894b-jhxf4","knative.dev/key":"kserve-dsml5/mlworkeralpha-predictor-00001"} {"severity":"INFO","timestamp":"2024-02-26T11:22:05.76818759Z","logger":"activator","caller":"net/throttler.go:323","message":"Set capacity to 2147483647 (backends: 1, index: 0/1)","commit":"41769de","knative.dev/controller":"activator","knative.dev/pod":"activator-58db57894b-jhxf4","knative.dev/key":"kserve-dsml5/mlworkeralpha-predictor-00001"} ```

Istio net controller

``` {"severity":"INFO","timestamp":"2024-02-26T11:45:05.387252203Z","logger":"net-istio-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"a21cc34-dirty","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.serverlessservice.reconciler","knative.dev/kind":"networking.internal.knative.dev.ServerlessService","knative.dev/traceid":"a9754112-491d-4f0b-aab3-92698cc72259","knative.dev/key":"kserve-dsml5/mlworkeralpha-predictor-00001","duration":"74.872µs"} {"severity":"INFO","timestamp":"2024-02-26T11:45:05.38796572Z","logger":"net-istio-controller.istio-ingress-controller","caller":"ingress/ingress.go:115","message":"Reconciling ingress: &v1alpha1.Ingress{TypeMeta:v1.TypeMeta{Kind:\"Ingress\", APIVersion:\"networking.internal.knative.dev/v1alpha1\"}, ObjectMeta:v1.ObjectMeta{Name:\"mlworkeralpha-predictor\", GenerateName:\"\", Namespace:\"kserve-dsml5\", SelfLink:\"\", UID:\"9302843e-7139-4083-8889-2618d8a6299c\", ResourceVersion:\"5396353\", Generation:1, CreationTimestamp:time.Date(2024, time.February, 23, 12, 0, 42, 0, time.Local), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{\"component\":\"predictor\", \"serving.knative.dev/route\":\"mlworkeralpha-predictor\", \"serving.knative.dev/routeNamespace\":\"kserve-dsml5\", \"serving.knative.dev/service\":\"mlworkeralpha-predictor\", \"serving.kserve.io/inferenceservice\":\"mlworkeralpha\"}, Annotations:map[string]string{\"networking.internal.knative.dev/rollout\":\"{\\\"configurations\\\":[{\\\"configurationName\\\":\\\"mlworkeralpha-predictor\\\",\\\"percent\\\":100,\\\"revisions\\\":[{\\\"revisionName\\\":\\\"mlworkeralpha-predictor-00001\\\",\\\"percent\\\":100}],\\\"stepParams\\\":{}}]}\", \"networking.knative.dev/ingress.class\":\"istio.ingress.networking.knative.dev\", \"serving.knative.dev/creator\":\"system:serviceaccount:kserve:kserve-controller-manager\", \"serving.knative.dev/lastModifier\":\"system:serviceaccount:kserve:kserve-controller-manager\"}, OwnerReferences:[]v1.OwnerReference{v1.OwnerReference{APIVersion:\"serving.knative.dev/v1\", Kind:\"Route\", Name:\"mlworkeralpha-predictor\", UID:\"1bb27753-b28f-46d7-a054-8546f9e32565\", Controller:(*bool)(0xc000aadb58), BlockOwnerDeletion:(*bool)(0xc000aadb59)}}, Finalizers:[]string{\"ingresses.networking.internal.knative.dev\"}, ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:\"controller\", Operation:\"Update\", APIVersion:\"networking.internal.knative.dev/v1alpha1\", Time:time.Date(2024, time.February, 23, 12, 0, 42, 0, time.Local), FieldsType:\"FieldsV1\", FieldsV1:(*v1.FieldsV1)(0xc000c17080), Subresource:\"\"}, v1.ManagedFieldsEntry{Manager:\"controller\", Operation:\"Update\", APIVersion:\"networking.internal.knative.dev/v1alpha1\", Time:time.Date(2024, time.February, 26, 8, 19, 39, 0, time.Local), FieldsType:\"FieldsV1\", FieldsV1:(*v1.FieldsV1)(0xc000c170b0), Subresource:\"status\"}}}, Spec:v1alpha1.IngressSpec{TLS:[]v1alpha1.IngressTLS(nil), Rules:[]v1alpha1.IngressRule{v1alpha1.IngressRule{Hosts:[]string{\"mlworkeralpha-predictor.kserve-dsml5\", \"mlworkeralpha-predictor.kserve-dsml5.svc\", \"mlworkeralpha-predictor.kserve-dsml5.svc.cluster.local\"}, Visibility:\"ClusterLocal\", HTTP:(*v1alpha1.HTTPIngressRuleValue)(0xc000c170c8)}, v1alpha1.IngressRule{Hosts:[]string{\"mlworkeralpha-predictor.kserve-dsml5.ml.proxy.mydomain.rnd\"}, Visibility:\"ExternalIP\", HTTP:(*v1alpha1.HTTPIngressRuleValue)(0xc000c170e0)}}, HTTPOption:\"Enabled\"}, Status:v1alpha1.IngressStatus{Status:v1.Status{ObservedGeneration:1, Conditions:v1.Conditions{apis.Condition{Type:\"LoadBalancerReady\", Status:\"True\", Severity:\"\", LastTransitionTime:apis.VolatileTime{Inner:time.Date(2024, time.February, 26, 8, 19, 39, 0, time.Local)}, Reason:\"\", Message:\"\"}, apis.Condition{Type:\"NetworkConfigured\", Status:\"True\", Severity:\"\", LastTransitionTime:apis.VolatileTime{Inner:time.Date(2024, time.February, 23, 12, 0, 42, 0, time.Local)}, Reason:\"\", Message:\"\"}, apis.Condition{Type:\"Ready\", Status:\"True\", Severity:\"\", LastTransitionTime:apis.VolatileTime{Inner:time.Date(2024, time.February, 26, 8, 19, 39, 0, time.Local)}, Reason:\"\", Message:\"\"}}, Annotations:map[string]string(nil)}, PublicLoadBalancer:(*v1alpha1.LoadBalancerStatus)(0xc000c170f8), PrivateLoadBalancer:(*v1alpha1.LoadBalancerStatus)(0xc000c17110)}}","commit":"a21cc34-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/traceid":"bb8d1041-5357-42f6-805f-fc2ed246f933","knative.dev/key":"kserve-dsml5/mlworkeralpha-predictor"} {"severity":"INFO","timestamp":"2024-02-26T11:45:05.388248166Z","logger":"net-istio-controller.istio-ingress-controller","caller":"ingress/ingress.go:204","message":"Creating/Updating VirtualServices","commit":"a21cc34-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/traceid":"bb8d1041-5357-42f6-805f-fc2ed246f933","knative.dev/key":"kserve-dsml5/mlworkeralpha-predictor"} {"severity":"INFO","timestamp":"2024-02-26T11:45:05.396817916Z","logger":"net-istio-controller.istio-ingress-controller","caller":"ingress/ingress.go:243","message":"Ingress successfully synced","commit":"a21cc34-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/traceid":"bb8d1041-5357-42f6-805f-fc2ed246f933","knative.dev/key":"kserve-dsml5/mlworkeralpha-predictor"} {"severity":"INFO","timestamp":"2024-02-26T11:45:05.3969275Z","logger":"net-istio-controller.istio-ingress-controller","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"a21cc34-dirty","knative.dev/controller":"istio-ingress-controller","knative.dev/controller":"knative.dev.net-istio.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/traceid":"bb8d1041-5357-42f6-805f-fc2ed246f933","knative.dev/key":"kserve-dsml5/mlworkeralpha-predictor","duration":"9.280963ms"} ```

Istio net webhook

``` {"severity":"INFO","timestamp":"2024-02-26T11:22:15.688868257Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"a21cc34-dirty","knative.dev/traceid":"6a16245d-bbd2-4b7d-83e2-1bb7a821123b","knative.dev/key":"webhook.istio.networking.internal.knative.dev","duration":"25.505338ms"} I0226 11:22:19.884387 1 leaderelection.go:260] successfully acquired lease knative-serving/net-istio-webhook.webhookcertificates.00-of-01 {"severity":"INFO","timestamp":"2024-02-26T11:22:19.884768233Z","logger":"net-istio-webhook","caller":"leaderelection/context.go:158","message":"\"net-istio-webhook-9965c55c5-tvblf_fd1fec51-07c3-405c-ad8e-52ee09dcbf25\" has started leading \"net-istio-webhook.webhookcertificates.00-of-01\"","commit":"a21cc34-dirty"} {"severity":"INFO","timestamp":"2024-02-26T11:22:19.885346395Z","logger":"net-istio-webhook.WebhookCertificates","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"a21cc34-dirty","knative.dev/traceid":"c3721b4f-4f6f-4c22-8e65-ec262d5be7c7","knative.dev/key":"knative-serving/net-istio-webhook-certs","duration":"269.874µs"} {"severity":"INFO","timestamp":"2024-02-26T11:45:05.378244816Z","logger":"net-istio-webhook","caller":"webhook/admission.go:93","message":"Webhook ServeHTTP request=&http.Request{Method:\"POST\", URL:(*url.URL)(0xc000184ea0), Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Accept\":[]string{\"application/json, */*\"}, \"Accept-Encoding\":[]string{\"gzip\"}, \"Content-Length\":[]string{\"9875\"}, \"Content-Type\":[]string{\"application/json\"}, \"User-Agent\":[]string{\"kube-apiserver-admission\"}}, Body:(*http.body)(0xc0004ba000), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:9875, TransferEncoding:[]string(nil), Close:false, Host:\"net-istio-webhook.knative-serving.svc:443\", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:\"10.64.4.3:1986\", RequestURI:\"/config-validation?timeout=10s\", TLS:(*tls.ConnectionState)(0xc000928160), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc0004fe280)}","commit":"a21cc34-dirty"} {"severity":"INFO","timestamp":"2024-02-26T11:45:05.379113734Z","logger":"net-istio-webhook","caller":"webhook/admission.go:151","message":"remote admission controller audit annotations=map[string]string(nil)","commit":"a21cc34-dirty","knative.dev/kind":"/v1, Kind=ConfigMap","knative.dev/namespace":"knative-serving","knative.dev/name":"config-istio","knative.dev/operation":"UPDATE","knative.dev/resource":"/v1, Resource=configmaps","knative.dev/subresource":"","knative.dev/userinfo":"kubernetes-admin","admissionreview/uid":"a0029b66-6387-478f-91aa-a6dac9da3851","admissionreview/allowed":true,"admissionreview/result":"nil"} ```

knative controller

> kubectl logs -n knative-serving controller-7dd875844b-4btf6 ``` {"severity":"INFO","timestamp":"2024-02-26T11:22:05.800546502Z","logger":"webhook","caller":"webhook/admission.go:151","message":"remote admission controller audit annotations=map[string]string(nil)","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt","knative.dev/kind":"autoscaling.internal.knative.dev/v1alpha1, Kind=PodAutoscaler","knative.dev/namespace":"kserve-dsml5","knative.dev/name":"mlworkeralpha-predictor-00001","knative.dev/operation":"UPDATE","knative.dev/resource":"autoscaling.internal.knative.dev/v1alpha1, Resource=podautoscalers","knative.dev/subresource":"status","knative.dev/userinfo":"system:serviceaccount:knative-serving:controller","admissionreview/uid":"670c015a-b28e-4f5e-8837-862a21685f26","admissionreview/allowed":true,"admissionreview/result":"nil"} I0226 11:22:15.238545 1 leaderelection.go:260] successfully acquired lease knative-serving/webhook.webhookcertificates.00-of-01 {"severity":"INFO","timestamp":"2024-02-26T11:22:15.238920851Z","logger":"webhook","caller":"leaderelection/context.go:158","message":"\"webhook-d8674645d-rvppt_7e6faa19-3acf-47ed-a49b-cd06e2a1df69\" has started leading \"webhook.webhookcertificates.00-of-01\"","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt"} {"severity":"INFO","timestamp":"2024-02-26T11:22:15.23954902Z","logger":"webhook.WebhookCertificates","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt","knative.dev/traceid":"841806ae-7921-4d56-a141-2a7439d48f47","knative.dev/key":"knative-serving/webhook-certs","duration":"301.304µs"} I0226 11:22:17.972949 1 leaderelection.go:260] successfully acquired lease knative-serving/webhook.validationwebhook.00-of-01 {"severity":"INFO","timestamp":"2024-02-26T11:22:17.973317915Z","logger":"webhook","caller":"leaderelection/context.go:158","message":"\"webhook-d8674645d-rvppt_2cef13b4-ce74-46dc-8ffd-7bdf2e5397f0\" has started leading \"webhook.validationwebhook.00-of-01\"","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt"} {"severity":"INFO","timestamp":"2024-02-26T11:22:18.005069959Z","logger":"webhook.ValidationWebhook","caller":"validation/reconcile_config.go:228","message":"Updating webhook","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt","knative.dev/traceid":"2f1161db-e844-45a8-8a5b-95e20e10bbff","knative.dev/key":"validation.webhook.serving.knative.dev"} {"severity":"INFO","timestamp":"2024-02-26T11:22:18.012378282Z","logger":"webhook.ValidationWebhook","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt","knative.dev/traceid":"2f1161db-e844-45a8-8a5b-95e20e10bbff","knative.dev/key":"validation.webhook.serving.knative.dev","duration":"38.690267ms"} ```

knative webhook

> kubectl logs -n knative-serving webhook-d8674645d-rvppt ``` {"severity":"INFO","timestamp":"2024-02-26T11:22:05.800546502Z","logger":"webhook","caller":"webhook/admission.go:151","message":"remote admission controller audit annotations=map[string]string(nil)","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt","knative.dev/kind":"autoscaling.internal.knative.dev/v1alpha1, Kind=PodAutoscaler","knative.dev/namespace":"kserve-dsml5","knative.dev/name":"mlworkeralpha-predictor-00001","knative.dev/operation":"UPDATE","knative.dev/resource":"autoscaling.internal.knative.dev/v1alpha1, Resource=podautoscalers","knative.dev/subresource":"status","knative.dev/userinfo":"system:serviceaccount:knative-serving:controller","admissionreview/uid":"670c015a-b28e-4f5e-8837-862a21685f26","admissionreview/allowed":true,"admissionreview/result":"nil"} I0226 11:22:15.238545 1 leaderelection.go:260] successfully acquired lease knative-serving/webhook.webhookcertificates.00-of-01 {"severity":"INFO","timestamp":"2024-02-26T11:22:15.238920851Z","logger":"webhook","caller":"leaderelection/context.go:158","message":"\"webhook-d8674645d-rvppt_7e6faa19-3acf-47ed-a49b-cd06e2a1df69\" has started leading \"webhook.webhookcertificates.00-of-01\"","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt"} {"severity":"INFO","timestamp":"2024-02-26T11:22:15.23954902Z","logger":"webhook.WebhookCertificates","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt","knative.dev/traceid":"841806ae-7921-4d56-a141-2a7439d48f47","knative.dev/key":"knative-serving/webhook-certs","duration":"301.304µs"} I0226 11:22:17.972949 1 leaderelection.go:260] successfully acquired lease knative-serving/webhook.validationwebhook.00-of-01 {"severity":"INFO","timestamp":"2024-02-26T11:22:17.973317915Z","logger":"webhook","caller":"leaderelection/context.go:158","message":"\"webhook-d8674645d-rvppt_2cef13b4-ce74-46dc-8ffd-7bdf2e5397f0\" has started leading \"webhook.validationwebhook.00-of-01\"","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt"} {"severity":"INFO","timestamp":"2024-02-26T11:22:18.005069959Z","logger":"webhook.ValidationWebhook","caller":"validation/reconcile_config.go:228","message":"Updating webhook","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt","knative.dev/traceid":"2f1161db-e844-45a8-8a5b-95e20e10bbff","knative.dev/key":"validation.webhook.serving.knative.dev"} {"severity":"INFO","timestamp":"2024-02-26T11:22:18.012378282Z","logger":"webhook.ValidationWebhook","caller":"controller/controller.go:550","message":"Reconcile succeeded","commit":"41769de","knative.dev/pod":"webhook-d8674645d-rvppt","knative.dev/traceid":"2f1161db-e844-45a8-8a5b-95e20e10bbff","knative.dev/key":"validation.webhook.serving.knative.dev","duration":"38.690267ms"} ```

Istio logs

> kubectl logs istio-ingressgateway-cdf98c974-vkqwv -n istio-system ``` 2024-02-26T13:15:40.735235Z info FLAG: --concurrency="0" 2024-02-26T13:15:40.735277Z info FLAG: --domain="istio-system.svc.cluster.local" 2024-02-26T13:15:40.735288Z info FLAG: --help="false" 2024-02-26T13:15:40.735296Z info FLAG: --log_as_json="false" 2024-02-26T13:15:40.735308Z info FLAG: --log_caller="" 2024-02-26T13:15:40.735316Z info FLAG: --log_output_level="default:info" 2024-02-26T13:15:40.735325Z info FLAG: --log_rotate="" 2024-02-26T13:15:40.735333Z info FLAG: --log_rotate_max_age="30" 2024-02-26T13:15:40.735342Z info FLAG: --log_rotate_max_backups="1000" 2024-02-26T13:15:40.735352Z info FLAG: --log_rotate_max_size="104857600" 2024-02-26T13:15:40.735359Z info FLAG: --log_stacktrace_level="default:none" 2024-02-26T13:15:40.735379Z info FLAG: --log_target="[stdout]" 2024-02-26T13:15:40.735390Z info FLAG: --meshConfig="./etc/istio/config/mesh" 2024-02-26T13:15:40.735398Z info FLAG: --outlierLogPath="" 2024-02-26T13:15:40.735407Z info FLAG: --profiling="true" 2024-02-26T13:15:40.735414Z info FLAG: --proxyComponentLogLevel="misc:error" 2024-02-26T13:15:40.735422Z info FLAG: --proxyLogLevel="warning" 2024-02-26T13:15:40.735429Z info FLAG: --serviceCluster="istio-proxy" 2024-02-26T13:15:40.735454Z info FLAG: --stsPort="0" 2024-02-26T13:15:40.735464Z info FLAG: --templateFile="" 2024-02-26T13:15:40.735474Z info FLAG: --tokenManagerPlugin="GoogleTokenExchange" 2024-02-26T13:15:40.735484Z info FLAG: --vklog="0" 2024-02-26T13:15:40.735494Z info Version 1.20.2-5f5d657c72d30a97cae97938de3a6831583e9f15-Clean 2024-02-26T13:15:40.768548Z info Maximum file descriptors (ulimit -n): 1073741816 2024-02-26T13:15:40.768847Z info Proxy role ips=[10.10.246.147] type=router id=istio-ingressgateway-cdf98c974-vkqwv.istio-system domain=istio-system.svc.cluster.local 2024-02-26T13:15:40.768994Z info Apply mesh config from file defaultConfig: discoveryAddress: istiod.istio-system.svc:15012 proxyMetadata: {} terminationDrainDuration: 20s tracing: zipkin: address: zipkin.istio-system:9411 defaultProviders: metrics: - prometheus enablePrometheusMerge: true rootNamespace: istio-system trustDomain: cluster.local 2024-02-26T13:15:40.771289Z info cpu limit detected as 3, setting concurrency 2024-02-26T13:15:40.771611Z info Effective config: binaryPath: /usr/local/bin/envoy concurrency: 3 configPath: ./etc/istio/proxy controlPlaneAuthPolicy: MUTUAL_TLS discoveryAddress: istiod.istio-system.svc:15012 drainDuration: 45s proxyAdminPort: 15000 serviceCluster: istio-proxy statNameLength: 189 statusPort: 15020 terminationDrainDuration: 20s tracing: zipkin: address: zipkin.istio-system:9411 2024-02-26T13:15:40.771629Z info JWT policy is third-party-jwt 2024-02-26T13:15:40.771636Z info using credential fetcher of JWT type in cluster.local trust domain 2024-02-26T13:15:40.773572Z info Workload SDS socket not found. Starting Istio SDS Server 2024-02-26T13:15:40.773606Z info CA Endpoint istiod.istio-system.svc:15012, provider Citadel 2024-02-26T13:15:40.773615Z info Opening status port 15020 2024-02-26T13:15:40.773668Z info Using CA istiod.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem 2024-02-26T13:15:40.829401Z info ads All caches have been synced up in 94.663331ms, marking server ready 2024-02-26T13:15:40.829806Z info xdsproxy Initializing with upstream address "istiod.istio-system.svc:15012" and cluster "Kubernetes" 2024-02-26T13:15:40.829828Z info sds Starting SDS grpc server 2024-02-26T13:15:40.830429Z info starting Http service at 127.0.0.1:15004 2024-02-26T13:15:40.832617Z info Pilot SAN: [istiod.istio-system.svc] 2024-02-26T13:15:40.835137Z info Starting proxy agent 2024-02-26T13:15:40.835182Z info starting 2024-02-26T13:15:40.835292Z info Envoy command: [-c etc/istio/proxy/envoy-rev.json --drain-time-s 45 --drain-strategy immediate --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --allow-unknown-static-fields --log-format %Y-%m-%dT%T.%fZ %l envoy %n %g:%# %v thread=%t -l warning --component-log-level misc:error --concurrency 3] 2024-02-26T13:15:45.786780Z info xdsproxy connected to upstream XDS server: istiod.istio-system.svc:15012 2024-02-26T13:15:45.800231Z info cache generated new workload certificate latency=4.970434213s ttl=23h59m59.199779254s 2024-02-26T13:15:45.800335Z info cache Root cert has changed, start rotating root cert 2024-02-26T13:15:45.800406Z info ads XDS: Incremental Pushing ConnectedEndpoints:0 Version: 2024-02-26T13:15:45.800580Z info cache returned workload trust anchor from cache ttl=23h59m59.199428991s 2024-02-26T13:16:04.066630Z info ads ADS: new connection for node:istio-ingressgateway-cdf98c974-vkqwv.istio-system-1 2024-02-26T13:16:04.066764Z info cache returned workload certificate from cache ttl=23h59m40.933244722s 2024-02-26T13:16:04.066931Z info ads ADS: new connection for node:istio-ingressgateway-cdf98c974-vkqwv.istio-system-2 2024-02-26T13:16:04.067142Z info cache returned workload trust anchor from cache ttl=23h59m40.932872388s 2024-02-26T13:16:04.067423Z info ads SDS: PUSH request for node:istio-ingressgateway-cdf98c974-vkqwv.istio-system resources:1 size:1.1kB resource:ROOTCA 2024-02-26T13:16:04.067430Z info ads SDS: PUSH request for node:istio-ingressgateway-cdf98c974-vkqwv.istio-system resources:1 size:4.0kB resource:default 2024-02-26T13:16:04.191913Z info Readiness succeeded in 23.469068727s 2024-02-26T13:16:04.192626Z info Envoy proxy is ready 2024-02-26T13:46:03.726824Z info xdsproxy connected to upstream XDS server: istiod.istio-system.svc:15012 ```

Thanks in advance for pointing me in right direction. If any info/additional data/logs are needed, let me know.

jrhunger commented 2 months ago

Do you have istio injection enabled in both the namespace where knative-serving is running and the namespace where your service is deployed? The Ready 1/1 on the Knative pods and 2/2 on the mlworkeralpha makes me think there is no Envoy sidecar (unless you are using ambient?)

dsgli commented 2 months ago

Tried both with enabling and not enabling istio injection in my worker namespace. Result was the same, namely response with 503.