knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.52k stars 1.15k forks source link

Istio ingress: duplicate listener 0.0.0.0_8081 found #10160

Closed jasonnance closed 3 years ago

jasonnance commented 3 years ago

/area networking

What version of Knative?

0.19.x

Expected Behavior

Istio ingress gateway with provided knative-istio-controller configuration exposes Knative services consistently.

Actual Behavior

After a Knative service is added to the knative-local-gateway, the istio-ingressgateway pod reports the following error, and the ingress never comes online:

warning envoy config    gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) 0.0.0.0_8081: duplicate listener 0.0.0.0_8081 found
2

Istio listener config:

$ istioctl proxy-config listeners istio-ingressgateway-xxx.istio-system
ADDRESS PORT  MATCH DESTINATION
0.0.0.0 8081  ALL   Route: http.80
0.0.0.0 15021 ALL   Non-HTTP/Non-TCP
0.0.0.0 15090 ALL   Non-HTTP/Non-TCP

This is running on a "toy" dev cluster which gets spun up and torn down each day and has all state managed using GitOps via Flux. Twice now, after tweaking a bunch of random stuff (ports, deleting/recreating gateways, etc), I've gotten the ingress into a working state, but when the cluster comes back up the following day with the same config, it's broken again.

When it works, the listener config looks like this:

$ istioctl proxy-config listeners istio-ingressgateway-86f88b6f6-plp4p.istio-system
ADDRESS PORT  MATCH DESTINATION
0.0.0.0 8080  ALL   Route: http.80
0.0.0.0 8081  ALL   Route: http.8081
0.0.0.0 15021 ALL   Non-HTTP/Non-TCP
0.0.0.0 15090 ALL   Non-HTTP/Non-TCP

As best I can tell, Istio is incorrectly merging the two gateways (default istio-ingressgateway and knative-local-gateway), but I'm not familiar enough with Istio/Knative Serving or their intersection to understand whether this is a problem with Knative's use of Istio or an Istio bug.

Steps to Reproduce the Problem

k8s 1.17 Istio 1.7.4

knative-istio-controller.yaml (relevant sections only, should be all default):

---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: knative-ingress-gateway
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "v0.19.0"
    networking.knative.dev/ingress-provider: istio
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"

---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: cluster-local-gateway
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "v0.19.0"
    networking.knative.dev/ingress-provider: istio
spec:
  selector:
    istio: cluster-local-gateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"

---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: knative-local-gateway
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "v0.19.0"
    networking.knative.dev/ingress-provider: istio
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 8081
        name: http
        protocol: HTTP
      hosts:
        - "*"

---
apiVersion: v1
kind: Service
metadata:
  name: knative-local-gateway
  namespace: istio-system
  labels:
    serving.knative.dev/release: "v0.19.0"
    networking.knative.dev/ingress-provider: istio
spec:
  type: ClusterIP
  selector:
    istio: ingressgateway
  ports:
    - name: http2
      port: 80
      targetPort: 8081

Knative service:

---
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: kinesis-seldon-adapter
  namespace: polyaxon
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1"
    spec:
      containers:
        - image: ko://my-module
          imagePullPolicy: Always
          ports:
            - containerPort: 8080

Gist with partial Istio proxy config dump showing the duplicate listeners in the failing state and the correct listeners in the working state (again, both states stemming from the same config shown above): https://gist.github.com/jasonnance/885e9772370ac6e46924844646597943

This is part of a more complex workflow involving Knative Eventing, so let me know if I've left out anything relevant.

Thanks!

vagababov commented 3 years ago

/cc @nak3 @JRBANCEL

JRBANCEL commented 3 years ago

Weird, the config clearly shows that 0.0.0.0_8081 is coming both from http.80 and http.8081. Only http.8081 is expected.

What does the k8s service look like? Istio will use the targetPort defined for 80 on Envoy.

kubectl get svc -n istio-system  istio-ingressgateway -o yaml
jasonnance commented 3 years ago

k8s service (targetPort appears to be 8080):

apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"istio-ingressgateway","install.operator.istio.io/owning-resource":"istio-operator","install.operator.istio.io/owning-resource-namespace":"istio-operator","istio":"ingressgateway","istio.io/rev":"default","operator.istio.io/component":"IngressGateways","operator.istio.io/managed":"Reconcile","operator.istio.io/version":"1.7.4","release":"istio"},"name":"istio-ingressgateway","namespace":"istio-system"},"spec":{"ports":[{"name":"status-port","port":15021,"targetPort":15021},{"name":"http2","port":80,"targetPort":8080},{"name":"https","port":443,"targetPort":8443},{"name":"tls","port":15443,"targetPort":15443}],"selector":{"app":"istio-ingressgateway","istio":"ingressgateway"},"type":"LoadBalancer"}}
  creationTimestamp: "2020-11-19T13:37:49Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  labels:
    app: istio-ingressgateway
    install.operator.istio.io/owning-resource: istio-operator
    install.operator.istio.io/owning-resource-namespace: istio-operator
    istio: ingressgateway
    istio.io/rev: default
    operator.istio.io/component: IngressGateways
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.7.4
    release: istio
  name: istio-ingressgateway
  namespace: istio-system
  resourceVersion: "563142"
  selfLink: /api/v1/namespaces/istio-system/services/istio-ingressgateway
  uid: 1943515f-83e7-400f-bab3-16d9a2c095df
spec:
  clusterIP: x.x.x.x
  externalTrafficPolicy: Cluster
  ports:
  - name: status-port
    nodePort: 32326
    port: 15021
    protocol: TCP
    targetPort: 15021
  - name: http2
    nodePort: 30029
    port: 80
    protocol: TCP
    targetPort: 8080
  - name: https
    nodePort: 31002
    port: 443
    protocol: TCP
    targetPort: 8443
  - name: tls
    nodePort: 30347
    port: 15443
    protocol: TCP
    targetPort: 15443
  selector:
    app: istio-ingressgateway
    istio: ingressgateway
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - hostname: xxx-xxx.us-east-1.elb.amazonaws.com

Here's my Istio operator config as well -- realized that might be relevant (although it's also mostly default):

# Adapted from https://knative.dev/docs/install/installing-istio/#installing-istio-without-sidecar-injection
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-operator
  namespace: istio-operator
spec:
  values:
    global:
      proxy:
        autoInject: disabled
      useMCP: false
      # The third-party-jwt is not enabled on all k8s.
      # See: https://istio.io/docs/ops/best-practices/security/#configure-third-party-service-account-tokens
      jwtPolicy: first-party-jwt
      tracer:
        zipkin:
          address: jaeger-collector.observability.svc.cluster.local:9411

    pilot:
      traceSampling: 1.0

  addonComponents:
    pilot:
      enabled: true
    prometheus:
      enabled: false

  components:
    ingressGateways:
      - name: istio-ingressgateway
        enabled: true
      - name: cluster-local-gateway
        enabled: true
        label:
          istio: cluster-local-gateway
          app: cluster-local-gateway
        k8s:
          service:
            type: ClusterIP
            ports:
            - port: 15020
              name: status-port
            - port: 80
              targetPort: 8080
              name: http2
            - port: 443
              targetPort: 8443
              name: https
Shashankft9 commented 3 years ago

Getting the same error while installing v0.19.0:

[external/envoy/source/common/config/grpc_subscription_impl.cc:101] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) 0.0.0.0_8081: duplicate listener 0.0.0.0_8081 found
istioctl proxy-config listeners istio-ingressgateway-bgkqm.istio-system
ADDRESS     PORT      TYPE
0.0.0.0     8081      HTTP
0.0.0.0     15090     HTTP
0.0.0.0     15021     HTTP

Because of this, rest of the ingress traffic also stops working since I have made the gateways on port 80. Without knative-serving, I have an entry there with 80 instead of 8081. 80 because unlike @jasonnance I have targetPort and port both set to be 80

JRBANCEL commented 3 years ago

@howardjohn Any idea how to debug this?

howardjohn commented 3 years ago

I think the issue is you have a Service for port 80 -> target 8081, a Gateway for 80 (which maps to 0.0.0.0_8081 due to targetPort) and a Gateway for 8081 (which maps to 0.0.0.0_8081 due to targetPort). Istio doesn't like this I guess.

I recall the same issue before, not sure if we have solved it already in a newer version or if its an open issue in istio/istio.

howardjohn commented 3 years ago

To clarify, its probably both a user error (not sure why you would want the above config - both Gateways should be port 80 probably) and an Istio bug (if user does it, we need to handle it gracefully)

JRBANCEL commented 3 years ago

The ingress Gateway listens to 8080 and the matching k8s svc maps 80 -> 8080 (default Istio setup, nothing specific to Knative).

Then, we have a (Knative) Gateway listen on 8081 and a matching k8s svc mapping 80 -> 8080.

This should work and does work (that's what runs in the Knative E2E tests).

I am not sure what is the issue in this particular scenario though.

howardjohn commented 3 years ago

In the original issue:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: knative-local-gateway
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "v0.19.0"
    networking.knative.dev/ingress-provider: istio
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 8081
        name: http
        protocol: HTTP
      hosts:
        - "*"
---
apiVersion: v1
kind: Service
metadata:
  name: knative-local-gateway
  namespace: istio-system
  labels:
    serving.knative.dev/release: "v0.19.0"
    networking.knative.dev/ingress-provider: istio
spec:
  type: ClusterIP
  selector:
    istio: ingressgateway
  ports:
    - name: http2
      port: 80
      targetPort: 8081
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: knative-ingress-gateway
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "v0.19.0"
    networking.knative.dev/ingress-provider: istio
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"

These 3 are enough to trigger it, then you have two Gateways which both map to port 8081. I don't think its standard knative setup, which is likely why its not a more common issue.

JRBANCEL commented 3 years ago

then you have two Gateways which both map to port 8081

Why though?

knative-local-gateway wants to listen on 8081 and the matching service targetPort for 80 is also 8081, so Envoy actually listens on 8081.

knative-ingress-gateway wants to listen on 80 but the matching service (default Istio setup) istio-ingressgateway has a targetPort of 8080 for port 80 so Envoy actually listens on 8080.

Am I missing something? This is the new Knative setup and it should work.

howardjohn commented 3 years ago

Ah, I think I missed part of it. The confusion is matching service I think. The mapping of knative-local-gateway -> 80/8081 service and knative-ingress-gateway -> 80/8080 service is not quite right.

There is a single istio-ingressgateway deployment, which has two services. one of them maps port 80 to 8080 and the other maps port 80 to 8081. So far, no problem.

Then we create a Gateway on port 80, which sets up listeners on port 8080 and 8081 (Maybe? I actually doubt this works, need to verify), all good

Then we create a Gateway on port 80801, which sets up another listener on port 8081 (it shouldn't, but the logic is wrong and doesn't merge), which breaks

JRBANCEL commented 3 years ago

Then we create a Gateway on port 80, which sets up listeners on port 8080 and 8081 (Maybe? I actually doubt this works, need to verify), all good

Hmm, maybe this is another instance where Istio is not deterministic and picks just one port?

This is worrying. This has been running fine in our E2E tests for the past month and we don't see this issue (both 1.7.x and 1.8.x)

howardjohn commented 3 years ago

@JRBANCEL do you have the same setup with multiple services with different port setups in your e2e tests? I was under the impression the original issue has some customization not in the standard knative setup that was triggering it

JRBANCEL commented 3 years ago

We install Istio with this profile: https://github.com/knative-sandbox/net-istio/blob/master/third_party/istio-stable/istio-ci-no-mesh.yaml

And these are the additional Gateway and Service objects: https://github.com/knative-sandbox/net-istio/blob/master/config/202-gateway.yaml https://github.com/knative-sandbox/net-istio/blob/master/config/203-local-gateway.yaml

howardjohn commented 3 years ago

Huh.. if you are not seeing issues then I may be mistaken. We will need to investigate this more on the Istio side

howardjohn commented 3 years ago

https://github.com/istio/istio/issues/29643 https://github.com/istio/istio/issues/29291

are similar issue on Istio tracker

JRBANCEL commented 3 years ago

Then we create a Gateway on port 80, which sets up listeners on port 8080 and 8081 (Maybe? I actually doubt this works, need to verify), all good

Can you check the implementation when you have a second because that is not what I am seeing.

Fresh Istio 1.8.1 install with the profile linked above:

ADDRESS PORT  MATCH DESTINATION
0.0.0.0 15021 ALL   Inline Route: /healthz/ready*
0.0.0.0 15090 ALL   Inline Route: /stats/prometheus*

After installing the ingress Gateway:

$ ko apply -f config/202-gateway.yaml 
gateway.networking.istio.io/knative-ingress-gateway created
ADDRESS PORT  MATCH DESTINATION
0.0.0.0 8080  ALL   Route: http.80
0.0.0.0 15021 ALL   Inline Route: /healthz/ready*
0.0.0.0 15090 ALL   Inline Route: /stats/prometheus*

After installing the local Gateway and Service:

$ ko apply -f config/203-local-gateway.yaml                                                                      
gateway.networking.istio.io/knative-local-gateway created
service/knative-local-gateway created
ADDRESS PORT  MATCH DESTINATION
0.0.0.0 8080  ALL   Route: http.80
0.0.0.0 8081  ALL   Route: http.8081
0.0.0.0 15021 ALL   Inline Route: /healthz/ready*
0.0.0.0 15090 ALL   Inline Route: /stats/prometheus*

Even when I delete the local Gateway and leave the matching Service, there is nothing on 8081:

ADDRESS PORT  MATCH DESTINATION
0.0.0.0 8080  ALL   Route: http.80
0.0.0.0 15021 ALL   Inline Route: /healthz/ready*
0.0.0.0 15090 ALL   Inline Route: /stats/prometheus*
Shashankft9 commented 3 years ago

For me, with all the knative components installed, it gives:

ADDRESS     PORT      TYPE
0.0.0.0     8081      HTTP
0.0.0.0     15090     HTTP
0.0.0.0     15021     HTTP

the above stays when I delete the knative-local-gateway, and only when I delete the knative-local-gateway svc, it goes back to:

ADDRESS     PORT      TYPE
0.0.0.0     80        HTTP
0.0.0.0     15090     HTTP
0.0.0.0     15021     HTTP
mattmoor commented 3 years ago

/assign @JRBANCEL

Moving this off of the serving triage query, we should move these discussions to net-istio.

mjgallag commented 3 years ago

Not sure it's helpful to root causing this but I just wanted to confirm that it does appear to be a timing issue (not deterministic). I worked around the problem by installing and waiting for istio to be ready first.

Shashankft9 commented 3 years ago

@mjgallag hey just to check, what happens if you happen to delete istio and reinstall it again while knative-serving is running? I deployed istio on a new cluster, then knative-serving and then for some reason had to reinstall istio, and that is when I got this issue again. Just checking if this issue is reproducible this way?

mjgallag commented 3 years ago

@Shashankft9 I didn't get this issue, but it did cause knative issues. A non knative service I have routed thru istio worked fine again right after I reinstalled istio. As for my knative services, they returned no healthy upstream even though I checked underlying revision pod containers and they all seemed fine. In fact, I couldn't find anything wrong with knative services, routes & revisions. I also looked at the istio gateways & virtual services created by knative for my services plus the envoy listeners & routes they create and they all looked correct too. I was able to deploy new revisions successfully but still got no healthy upstream. I had to delete and recreate my services for them to work again.

JRBANCEL commented 3 years ago

I reported a minimal repro of this issue to Istio: https://github.com/istio/istio/issues/31084

ZhuangYuZY commented 3 years ago

It also happens in my env today when update istio. It impacts http endpoint access from external. As LB forward http 80 traffic to istio gateway port 8080. But 8080 missing from istio listener. It seems like timing issue in istio. I workaround the issue by remove knative-local-gateway service in istio-system namespace.

  1. Before change:
    ( istioctl proxy-config listener istio-ingressgateway-5b69f56d57-nl4x6.istio-system
    ADDRESS PORT  MATCH                                                             DESTINATION
    0.0.0.0 8081  ALL                                                               Route: http.80          //8080 missing
    0.0.0.0 8089  ALL                                                               Route: http.8089    //before I change port to 8089 in knative-local-gateway to workaround the issue.
  2. k delete svc -n istio-system knative-local-gateway
    ( istioctl proxy-config listener istio-ingressgateway-5b69f56d57-nl4x6.istio-system
    ADDRESS PORT  MATCH                                                             DESTINATION
    0.0.0.0 8080  ALL                                                               Route: http.80        //8080 listener back, which define in istio-ingressgateway service by default in istio-system namespace.
    0.0.0.0 8089  ALL                                                               Route: http.8089
  3. k delete gateway -n knative-serving knative-local-gateway gateway.networking.istio.io "knative-local-gateway" deleted knative-local-gateway seem recreated.
    (istioctl proxy-config listener istio-ingressgateway-5b69f56d57-nl4x6.istio-system
    ADDRESS PORT  MATCH                                                             DESTINATION
    0.0.0.0 8080  ALL                                                               Route: http.80
    0.0.0.0 8081  ALL                                                               Route: http.8081    //8081 back as normall, we removed knative-local-gateway, and recreate it as default.
  4. create knative-local-gateway service, istio listener still correct.
    (dev-serving-s01:default)root@coligo-test:~# istioctl proxy-config listener istio-ingressgateway-5b69f56d57-nl4x6.istio-system
    ADDRESS PORT  MATCH                                                             DESTINATION
    0.0.0.0 8080  ALL                                                               Route: http.80
    0.0.0.0 8081  ALL                                                               Route: http.8081
howardjohn commented 3 years ago

This should be fixed. in istio by https://github.com/istio/istio/commit/ae8b0e2cf984f13224b35423edb52f25832ac616

evankanderson commented 3 years ago

Looks like this may be fixed in Istio.

/close

knative-prow-robot commented 3 years ago

@evankanderson: Closing this issue.

In response to [this](https://github.com/knative/serving/issues/10160#issuecomment-803068155): >Looks like this may be fixed in Istio. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
nak3 commented 3 years ago

It seems https://github.com/istio/istio/commit/ae8b0e2cf984f13224b35423edb52f25832ac616 did not fix the issue. We still need to track https://github.com/istio/istio/issues/31084

nak3 commented 3 years ago

Quick update, https://github.com/istio/istio/pull/33021 should fix the issue.

dprotaso commented 3 years ago

Related: https://github.com/knative-sandbox/net-istio/pull/636

We'll need to pull in the next 1.9.x patch release when it's out

dprotaso commented 3 years ago

We're pulling in the latest patches now - https://github.com/knative-sandbox/net-istio/pull/699

houshym commented 3 years ago

I can confirm this issue exists in istio 1.9.6. kubernetes version 1.20.7

dprotaso commented 3 years ago

@houshym do you have the following ~annotation~ label on your gateway?

https://github.com/knative-sandbox/net-istio/pull/636/files

 experimental.istio.io/disable-gateway-port-translation: "true"
houshym commented 3 years ago

@dprotaso I don't have. should I add it?

dprotaso commented 3 years ago

According to that PR having that label on the gateway (with istio 1.9.6) should activate that functionality which supposedly fixes the behaviour. Hence why I was saying that should fix the issue for Knative's installation of Istio.

dprotaso commented 3 years ago

Let me know if it doesn't - then maybe we should reach out to some Istio folks.

I think for now we can close this issue - ~I'm going to move it to the net-istio repo as this is specific to that~

ok ... I can't transfer to another org :(