openservicemesh / osm

Open Service Mesh (OSM) is a lightweight, extensible, cloud native service mesh that allows users to uniformly manage, secure, and get out-of-the-box observability features for highly dynamic microservice environments.
https://openservicemesh.io/
Apache License 2.0
2.59k stars 277 forks source link

E5011 - gRPC stream closed by proxy #5320

Closed eguer004 closed 1 year ago

eguer004 commented 1 year ago

Bug description: Currently we have OSM installed in an AKS cluster. OSM was enabled for a little over two weeks with mTLS configured using the following instructions _https://release-v1-0.docs.openservicemesh.io/docs/demos/ingress_k8s_nginx/#https-ingress-mtls-and-tls_

Services were running fine, until we started seeing the following errors: nginx ingress controller logs: SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: X.X.X.X, server: abc.def.com, request: "POST /v1/user/locale HTTP/1.1", upstream: "https://X.X.X.X:8080/v1/user/locale", host: "abc.def.com"

osm-controller logs: "level":"error","component":"envoy/ads","error_code":"E5011","proxy":"[ProxyUUID=d01f7841-555d-4e1f-a609-478b783851c7], [Pod metadata=UID=43344208-08e8-4671-9991-b6ed55fde88e, Namespace=my-service-ns, Name=my-service-5549b48c9f-qjrf9, ServiceAccount=default]","time":"2023-04-05T14:00:25Z","file":"stream.go:102","message":"gRPC stream closed by proxy [ProxyUUID=d01f7841-555d-4e1f-a609-478b783851c7], [Pod metadata=UID=43344208-08e8-4671-9991-b6ed55fde88e, Namespace=service-ns, Name=service-5549b48c9f-qjrf9, ServiceAccount=default]!"

I cannot determine the cause of the issue or how to reproduce it in an exact replica environment where OSM is also enabled. What could have caused something like this to happen? The documentation shows this description for the error logged by the osm-controller pod, (error code E5011): +------------+----------------------------------------------------------------------------------+ | E5011 | An Envoy proxy data structure representing a newly connected envoy proxy to the | | | XDS server could not be initialized. | +------------+----------------------------------------------------------------------------------+

I would like to determine the cause and how to prevent it.

Affected area (please mark with X where applicable):

Expected behavior:

Steps to reproduce the bug (as precisely as possible):

How was OSM installed?: Via Azure CLI. https://docs.microsoft.com/en-us/azure/aks/open-service-mesh-deploy-addon-az-cli

Anything else we need to know?:

Environment:

jaellio commented 1 year ago

Hi @eguer004, could you please share your MeshConfig?

kubectl get ConfigMap osm-mesh-config -n osm-system -o yaml
eguer004 commented 1 year ago

Hi @jaellio, sure. The command I used is kubectl get meshconfig osm-mesh-config -n kube-system -o yaml

apiVersion: config.openservicemesh.io/v1alpha2
kind: MeshConfig
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"kind":"MeshConfig","apiVersion":"config.openservicemesh.io/configv1alpha2","metadata":{"name":"osm-mesh-config","creationTimestamp":null},"spec":{"sidecar":{"enablePrivilegedInitContainer":false,"logLevel":"error","configResyncInterval":"0s","resources":{},"localProxyMode":"Localhost"},"traffic":{"enableEgress":true,"outboundIPRangeExclusionList":[],"outboundIPRangeInclusionList":[],"outboundPortExclusionList":[],"inboundPortExclusionList":[],"enablePermissiveTrafficPolicyMode":true,"inboundExternalAuthorization":{"enable":false,"failureModeAllow":false},"networkInterfaceExclusionList":[]},"observability":{"osmLogLevel":"info","enableDebugServer":true,"tracing":{"enable":false}},"certificate":{"serviceCertValidityDuration":"24h","certKeyBitSize":2048},"featureFlags":{"enableWASMStats":true,"enableEgressPolicy":true,"enableSnapshotCacheMode":false,"enableAsyncProxyServiceMapping":false,"enableIngressBackendPolicy":true,"enableEnvoyActiveHealthChecks":false,"enableRetryPolicy":false}}}
  creationTimestamp: "2023-03-17T02:05:58Z"
  generation: 3
  name: osm-mesh-config
  namespace: kube-system
  resourceVersion: "103105437"
spec:
  certificate:
    certKeyBitSize: 2048
    ingressGateway:
      secret:
        name: osm-nginx-client-cert
        namespace: kube-system
      subjectAltNames:
      - nginx-ingress-ingress-nginx.ingress-basic.cluster.local
      validityDuration: 24h
    serviceCertValidityDuration: 24h
  featureFlags:
    enableAsyncProxyServiceMapping: false
    enableEgressPolicy: true
    enableEnvoyActiveHealthChecks: false
    enableIngressBackendPolicy: true
    enableRetryPolicy: false
    enableSnapshotCacheMode: false
    enableWASMStats: true
  observability:
    enableDebugServer: true
    osmLogLevel: info
    tracing:
      enable: false
  sidecar:
    configResyncInterval: 0s
    enablePrivilegedInitContainer: false
    localProxyMode: Localhost
    logLevel: error
    resources: {}
    tlsMaxProtocolVersion: TLSv1_3
    tlsMinProtocolVersion: TLSv1_2
  traffic:
    enableEgress: true
    enablePermissiveTrafficPolicyMode: true
    inboundExternalAuthorization:
      enable: false
      failureModeAllow: false
      statPrefix: inboundExtAuthz
      timeout: 1s
    inboundPortExclusionList: []
    outboundIPRangeExclusionList:
    - 10.0.0.1/32
    outboundPortExclusionList: []
jaellio commented 1 year ago

Thanks @eguer004! Could you also share this additional information:

  1. osm-controller logs with the log level set to trace
  2. ingress and ingressBackend yamls
eguer004 commented 1 year ago

Thank you @jaellio, unfortunately OSM is now disabled and I cannot retrieve more logs with the log level set to trace. This is what I have

{"level":"info","component":"osm-controller/main","time":"2023-04-05T03:18:55Z","file":"osm-controller.go:150","message":"Starting osm-controller v1.2.3; 899fdb8e283aeca124c85c7a752e7b9534eea86e; 2022-12-15-00:05"}
W0405 03:18:55.243789       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
{"level":"info","time":"2023-04-05T03:18:55Z","message":"InformerCollection started"}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for TCPRoute informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for MeshConfig informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for Egress informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for Retry informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for Service informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for Endpoints informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for Namespace informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for TrafficTarget informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for IngressBackend informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for UpstreamTrafficSetting informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for Pod informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for TrafficSplit informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for MeshRootCertificate informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for ServiceAccount informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Waiting for HTTPRouteGroup informer cache sync..."}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Caches for [TCPRoute MeshConfig Egress Retry Service Endpoints Namespace TrafficTarget IngressBackend UpstreamTrafficSetting Pod TrafficSplit MeshRootCertificate ServiceAccount HTTPRouteGroup] synced successfully"}
{"level":"info","time":"2023-04-05T03:18:55Z","message":"Secret kube-system/osm-ca-bundle already exists in kubernetes, loading."}
{"level":"info","component":"certificate","time":"2023-04-05T03:18:55Z","file":"manager.go:81","message":"stopping MRC watch..."}
{"level":"info","component":"workerpool","time":"2023-04-05T03:18:56Z","file":"workerpool.go:45","message":"New worker pool setting up 4 workers"}
{"level":"info","component":"utilities","time":"2023-04-05T03:18:56Z","file":"grpc.go:20","message":"Setting up ADS gRPC server..."}
{"level":"info","component":"workerpool","time":"2023-04-05T03:18:56Z","file":"workerpool.go:82","message":"Worker 0 running"}
{"level":"info","component":"workerpool","time":"2023-04-05T03:18:56Z","file":"workerpool.go:82","message":"Worker 2 running"}
{"level":"info","component":"workerpool","time":"2023-04-05T03:18:56Z","file":"workerpool.go:82","message":"Worker 1 running"}
{"level":"info","component":"workerpool","time":"2023-04-05T03:18:56Z","file":"workerpool.go:82","message":"Worker 3 running"}
{"level":"info","component":"utilities","time":"2023-04-05T03:18:56Z","file":"grpc.go:58","message":"[grpc][ADS] Started server on: [::]:15128"}
{"level":"info","component":"osm-validator","time":"2023-04-05T03:18:56Z","file":"patch.go:138","message":"Finished creating ValidatingWebhookConfiguration aks-osm-validator-mesh-osm"}
{"level":"info","component":"osm-controller/main","time":"2023-04-05T03:18:56Z","file":"osm-controller.go:313","message":"OSM reconciler enabled for validating webhook"}
{"level":"info","component":"reconciler","time":"2023-04-05T03:18:56Z","file":"client.go:114","message":"OSM reconciler client started"}
{"level":"info","component":"reconciler","time":"2023-04-05T03:18:56Z","file":"client.go:128","message":"Starting reconciler informer: ValidatingWebhookConfigInformerKey"}
{"level":"info","component":"reconciler","time":"2023-04-05T03:18:56Z","file":"client.go:134","message":"Waiting for reconciler informer's cache to sync"}
{"level":"info","component":"webhook","time":"2023-04-05T03:18:56Z","file":"server.go:65","message":"Starting osm-validator webhook server on: :9093"}
{"level":"info","component":"http-server","time":"2023-04-05T03:18:56Z","file":"httpserver.go:70","message":"Starting API Server on :9091"}
{"level":"info","component":"http-server","time":"2023-04-05T03:18:56Z","file":"httpserver.go:70","message":"Starting API Server on :9092"}
{"level":"info","component":"reconciler","time":"2023-04-05T03:18:56Z","file":"client.go:139","message":"Cache sync finished for reconciler informer : [ValidatingWebhookConfigInformerKey]"}
{"level":"info","component":"envoy/sds","proxy":"[ProxyUUID=306658e3-3121-462a-b719-d4b4ca0d898a], [Pod metadata=UID=f21acd81-d1b9-4d88-8af1-09ae961c8374, Namespace=my-service-ns, Name=my-service-5549b48c9f-zg2zt, ServiceAccount=default]","time":"2023-04-05T03:18:57Z","file":"response.go:22","message":"Composing SDS Discovery Response"}
{"level":"info","component":"envoy/sds","proxy":"[ProxyUUID=306658e3-3121-462a-b719-d4b4ca0d898a], [Pod metadata=UID=f21acd81-d1b9-4d88-8af1-09ae961c8374, Namespace=my-service-ns, Name=my-service-5549b48c9f-zg2zt, ServiceAccount=default]","time":"2023-04-05T03:18:57Z","file":"response.go:37","message":"Creating SDS response for request for resources [root-cert-for-mtls-inbound:my-service-ns/default root-cert-for-mtls-outbound:my-service-ns/my-service-service service-cert:my-service-ns/default]"}
{"level":"info","component":"envoy/sds","proxy":"[ProxyUUID=55c4ec0a-0a9c-46e2-93f6-1b1993752e31], [Pod metadata=UID=bbcbe712-81a0-44a8-b7ff-0f2ee16483d8, Namespace=my-service-ns, Name=my-service-5549b48c9f-cp5ln, ServiceAccount=default]","time":"2023-04-05T03:18:57Z","file":"response.go:22","message":"Composing SDS Discovery Response"}
{"level":"info","component":"envoy/sds","proxy":"[ProxyUUID=55c4ec0a-0a9c-46e2-93f6-1b1993752e31], [Pod metadata=UID=bbcbe712-81a0-44a8-b7ff-0f2ee16483d8, Namespace=my-service-ns, Name=my-service-5549b48c9f-cp5ln, ServiceAccount=default]","time":"2023-04-05T03:18:57Z","file":"response.go:37","message":"Creating SDS response for request for resources [root-cert-for-mtls-inbound:my-service-ns/default root-cert-for-mtls-outbound:my-service-ns/my-service-service service-cert:my-service-ns/default]"}
{"level":"info","component":"envoy/rds","time":"2023-04-05T03:18:57Z","file":"response.go:127","message":"RDS did not fulfill all requested resources (diff: Set{}). Fulfill with empty RouteConfigs."}
{"level":"info","component":"envoy/rds","time":"2023-04-05T03:18:57Z","file":"response.go:127","message":"RDS did not fulfill all requested resources (diff: Set{}). Fulfill with empty RouteConfigs."}
{"level":"info","component":"envoy/rds","time":"2023-04-05T03:18:57Z","file":"response.go:127","message":"RDS did not fulfill all requested resources (diff: Set{}). Fulfill with empty RouteConfigs."}
{"level":"info","component":"envoy/rds","time":"2023-04-05T03:18:57Z","file":"response.go:127","message":"RDS did not fulfill all requested resources (diff: Set{}). Fulfill with empty RouteConfigs."}
{"level":"info","component":"envoy/ads","proxy":"[ProxyUUID=55c4ec0a-0a9c-46e2-93f6-1b1993752e31], [Pod metadata=UID=bbcbe712-81a0-44a8-b7ff-0f2ee16483d8, Namespace=my-service-ns, Name=my-service-5549b48c9f-cp5ln, ServiceAccount=default]","time":"2023-04-05T03:18:57Z","file":"stream.go:122","message":"Broadcast update received"}
{"level":"info","component":"envoy/ads","proxy":"[ProxyUUID=306658e3-3121-462a-b719-d4b4ca0d898a], [Pod metadata=UID=f21acd81-d1b9-4d88-8af1-09ae961c8374, Namespace=my-service-ns, Name=my-service-5549b48c9f-zg2zt, ServiceAccount=default]","time":"2023-04-05T03:18:57Z","file":"stream.go:122","message":"Broadcast update received"}
{"level":"info","component":"envoy/rds","time":"2023-04-05T03:18:57Z","file":"response.go:127","message":"RDS did not fulfill all requested resources (diff: Set{}). Fulfill with empty RouteConfigs."}
{"level":"info","component":"envoy/rds","time":"2023-04-05T03:18:57Z","file":"response.go:127","message":"RDS did not fulfill all requested resources (diff: Set{}). Fulfill with empty RouteConfigs."}
{"level":"info","component":"envoy/rds","time":"2023-04-05T03:18:57Z","file":"response.go:127","message":"RDS did not fulfill all requested resources (diff: Set{}). Fulfill with empty RouteConfigs."}
{"level":"info","component":"envoy/rds","time":"2023-04-05T03:18:57Z","file":"response.go:127","message":"RDS did not fulfill all requested resources (diff: Set{}). Fulfill with empty RouteConfigs."}
{"level":"error","component":"envoy/ads","error_code":"E5011","proxy":"[ProxyUUID=306658e3-3121-462a-b719-d4b4ca0d898a], [Pod metadata=UID=f21acd81-d1b9-4d88-8af1-09ae961c8374, Namespace=my-service-ns, Name=my-service-5549b48c9f-zg2zt, ServiceAccount=default]","time":"2023-04-05T14:00:31Z","file":"stream.go:102","message":"gRPC stream closed by proxy [ProxyUUID=306658e3-3121-462a-b719-d4b4ca0d898a], [Pod metadata=UID=f21acd81-d1b9-4d88-8af1-09ae961c8374, Namespace=my-service-ns, Name=my-service-5549b48c9f-zg2zt, ServiceAccount=default]!"}
{"level":"error","component":"envoy/ads","error_code":"E5011","proxy":"[ProxyUUID=55c4ec0a-0a9c-46e2-93f6-1b1993752e31], [Pod metadata=UID=bbcbe712-81a0-44a8-b7ff-0f2ee16483d8, Namespace=my-service-ns, Name=my-service-5549b48c9f-cp5ln, ServiceAccount=default]","time":"2023-04-05T14:00:32Z","file":"stream.go:102","message":"gRPC stream closed by proxy [ProxyUUID=55c4ec0a-0a9c-46e2-93f6-1b1993752e31], [Pod metadata=UID=bbcbe712-81a0-44a8-b7ff-0f2ee16483d8, Namespace=my-service-ns, Name=my-service-5549b48c9f-cp5ln, ServiceAccount=default]!"}

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.k8s.io/v1","kind":"Ingress","metadata":{"annotations":{"nginx.ingress.kubernetes.io/auth-tls-verify-client":"on","nginx.ingress.kubernetes.io/backend-protocol":"HTTPS","nginx.ingress.kubernetes.io/configuration-snippet":"proxy_ssl_name \"default.my-service-ns.cluster.local\";","nginx.ingress.kubernetes.io/proxy-ssl-secret":"kube-system/osm-nginx-client-cert","nginx.ingress.kubernetes.io/proxy-ssl-verify":"on","nginx.ingress.kubernetes.io/rewrite-target":"/$1","nginx.ingress.kubernetes.io/ssl-redirect":"false","nginx.ingress.kubernetes.io/use-regex":"true"},"name":"my-service-ingress","namespace":"my-service-ns"},"spec":{"ingressClassName":"nginx","rules":[{"host":"abc.def.com","http":{"paths":[{"backend":{"service":{"name":"my-service","port":{"number":80}}},"path":"/(.*)","pathType":"ImplementationSpecific"}]}}],"tls":[{"hosts":["abc.def.com"],"secretName":"my-ingress-tls"}]}}
    nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
    nginx.ingress.kubernetes.io/configuration-snippet: proxy_ssl_name "default.my-service-ns.cluster.local";
    nginx.ingress.kubernetes.io/proxy-ssl-secret: kube-system/osm-nginx-client-cert
    nginx.ingress.kubernetes.io/proxy-ssl-verify: "on"
    nginx.ingress.kubernetes.io/rewrite-target: /$1
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    nginx.ingress.kubernetes.io/use-regex: "true"
  creationTimestamp: "2022-07-28T00:47:12Z"
  generation: 4
  name: my-service-ingress
  namespace: my-service-ns
  resourceVersion: "124373690"
spec:
  ingressClassName: nginx
  rules:
  - host: abc.def.com
    http:
      paths:
      - backend:
          service:
            name: my-service
            port:
              number: 80
        path: /(.*)
        pathType: ImplementationSpecific
  tls:
  - hosts:
    - abc.def.com
    secretName: my-ingress-tls
status:
  loadBalancer:
    ingress:
    - ip: X.X.X.X

IngressBackend

kind: IngressBackend
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"policy.openservicemesh.io/v1alpha1","kind":"IngressBackend","metadata":{"annotations":{},"name":"my-ingressbackend","namespace":"my-service-ns"},"spec":{"backends":[{"name":"my-service","port":{"number":8080,"protocol":"https"},"tls":{"skipClientCertValidation":false}}],"sources":[{"kind":"Service","name":"nginx-ingress-ingress-nginx-controller","namespace":"ingress-basic"},{"kind":"AuthenticatedPrincipal","name":"nginx-ingress-ingress-nginx.ingress-basic.cluster.local"}]}}
  creationTimestamp: "2023-03-15T18:49:58Z"
  generation: 2
  name: my-ingressbackend
  namespace: my-service-ns
  resourceVersion: "125237038"
spec:
  backends:
  - name: my-service
    port:
      number: 8080
      protocol: https
    tls:
      skipClientCertValidation: false
  sources:
  - kind: Service
    name: nginx-ingress-ingress-nginx-controller
    namespace: ingress-basic
  - kind: AuthenticatedPrincipal
    name: nginx-ingress-ingress-nginx.ingress-basic.cluster.local
status:
  currentStatus: committed
  reason: successfully committed by the system
github-actions[bot] commented 1 year ago

This issue will be closed due to a long period of inactivity. If you would like this issue to remain open then please comment or update.

github-actions[bot] commented 1 year ago

Issue closed due to inactivity.