openservicemesh / osm

Open Service Mesh (OSM) is a lightweight, extensible, cloud native service mesh that allows users to uniformly manage, secure, and get out-of-the-box observability features for highly dynamic microservice environments.
https://openservicemesh.io/
Apache License 2.0
2.59k stars 277 forks source link

upstream connect error or disconnect/reset before headers. reset reason: connection failure #4583

Closed chandan9778 closed 2 years ago

chandan9778 commented 2 years ago

Bug description: Currently we have installed osm on AKS cluster once osm is enabled on namespace level we are unable to access any ingress based(Exposed via nginx ingress controller) application and getting upstream connect error or disconnect/reset before headers. reset reason: connection failure after envoy Affected area (please mark with X where applicable):

Expected behavior:

Steps to reproduce the bug (as precisely as possible):

How was OSM installed?: USing azure CLI. https://docs.microsoft.com/en-us/azure/aks/open-service-mesh-deploy-addon-az-cli Anything else we need to know?:

Environment:

trstringer commented 2 years ago

This looks like it could be related to this nginx ingress controller bug that was resolved in this nginx ingress controller PR. This fix is included in nginx ingress controller v1.1.1.

Can you look at your nginx ingress controller logs? Do you see SSL certificate expired errors?

chandan9778 commented 2 years ago

@trstringer Thanks for your response but currently am using nginx ingress controller v1.1.1

trstringer commented 2 years ago

Can you provide nginx ingress controller error log entries pertaining to these failures?

shashankram commented 2 years ago

@chandan9778 Have you looked at https://release-v1-0.docs.openservicemesh.io/docs/demos/ingress_k8s_nginx/? If not, please take a look to confirm you have applied the ingress configurations correctly.

chandan9778 commented 2 years ago

@trstringer 796 peer closed connection in SSL handshake (104: Connection reset by peer) while SSL handshaking to upstream, client: xx.xx.xx.xx(ip), server: xyz.dns.com, request: "GET /podname HTTP/2.0", upstream: "https://xx.xx.xx.xx:8080/", host: "xyz.dns.com"

chandan9778 commented 2 years ago

@shashankram yes applied the configuration correctly as per the docs ,The error am getting only when enabling osm for any of the ingress based application.

shashankram commented 2 years ago

@chandan9778 Please share the following YAML configurations (redact info where necessary):

  1. k8s Service for which ingress failing
  2. k8s Ingress configuration
  3. OSM IngressBackend configuration
chandan9778 commented 2 years ago

@shashankram please find above mentioned yamls for your reference ##INGRESSBACKEND.YAML###

apiVersion: policy.openservicemesh.io/v1alpha1
kind: IngressBackend
metadata:
  name: xyz 
  namespace: namespace-name
spec:
  backends:
  - name: xyz-service
    port:
      number: 8080 # targetPort of httpbin service
      protocol: https
    tls:
      skipClientCertValidation: false
  sources:
  - kind: Service
    name: ingress-nginx-controller
    namespace: ingress_namespace
  - kind: AuthenticatedPrincipal
    name: ingress-nginx.ingress_namespace.cluster.local

###SERVICE.YAML####

apiVersion: v1
kind: Service
metadata:
  name: xyz
  namespace: namespace_name
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 8080
  selector:
    k8s-app: pod_selector_name

###INGRESS.YAML###

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: xyz-ingress
  namespace: nmaesppace_name
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/force-ssl-redirect: "false"
    nginx.ingress.kubernetes.io/rewrite-target: /$2
    nginx.ingress.kubernetes.io/use-regex: "true"
    ingress.kubernetes.io/tls-minimum-version: "1.2"
    kubernetes.io/ingress.allow-http: "false"
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_ssl_name "default.application_specific_namesapce_name.cluster.local";
    nginx.ingress.kubernetes.io/proxy-ssl-secret: kube-system/osm-nginx-client-cert
    nginx.ingress.kubernetes.io/proxy-ssl-verify: on

spec:
  tls:
  - hosts:
    - dummyingresshostname
    secretName: dummyingresssecretname
  rules:
    - host: dummyingresshostname
      http:
        paths:
          - path: /podname(/|$)(.*)
            backend:
              serviceName: service_name
              servicePort: 80   ``
shashankram commented 2 years ago

@shashankram please find above mentioned yamls for your reference ##INGRESSBACKEND.YAML### apiVersion: policy.openservicemesh.io/v1alpha1 kind: IngressBackend metadata: name: xyz namespace: namespace-name spec: backends:

  • name: xyz-service port: number: 8080 # targetPort of httpbin service protocol: https tls: skipClientCertValidation: false sources:
  • kind: Service name: ingress-nginx-controller namespace: ingress_namespace
  • kind: AuthenticatedPrincipal name: ingress-nginx.ingress_namespace.cluster.local

###SERVICE.YAML####

apiVersion: v1 kind: Service metadata: name: xyz namespace: namespace_name spec: type: ClusterIP ports:

  • port: 80 targetPort: 8080 selector: k8s-app: pod_selector_name

###INGRESS.YAML###

apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: name: xyz-ingress namespace: nmaesppace_name annotations: kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/force-ssl-redirect: "false" nginx.ingress.kubernetes.io/rewrite-target: /$2 nginx.ingress.kubernetes.io/use-regex: "true" ingress.kubernetes.io/tls-minimum-version: "1.2" kubernetes.io/ingress.allow-http: "false" nginx.ingress.kubernetes.io/backend-protocol: HTTPS nginx.ingress.kubernetes.io/configuration-snippet: | proxy_ssl_name "default.application_specific_namesapce_name.cluster.local"; nginx.ingress.kubernetes.io/proxy-ssl-secret: kube-system/osm-nginx-client-cert nginx.ingress.kubernetes.io/proxy-ssl-verify: on

spec: tls:

  • hosts:

    • dummyingresshostname secretName: dummyingresssecretname rules:
    • host: dummyingresshostname http: paths:
    • path: /podname(/|$)(.*) backend: serviceName: service_name servicePort: 80

@chandan9778 thanks, could you kindly format the snippet, see https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code

Also would you mind sharing the MeshConfig YAML: kubectl get meshconfig osm-mesh-config -n <osm namespace> -o yaml

chandan9778 commented 2 years ago

@shashankram modified the above code snippet as per the standard and also please find the osm-messh-config yaml.

`apiVersion: config.openservicemesh.io/v1alpha1
kind: MeshConfig
metadata:
  creationTimestamp: "2022-03-08T11:32:10Z"
  generation: 5
  name: osm-mesh-config
  namespace: kube-system
  resourceVersion: "xxx"
  uid: xxx
spec:
  certificate:
    certKeyBitSize: 2048
    ingressGateway:
      secret:
        name: dummy-secret
        namespace: kube-system
      subjectAltNames:
      - ingress-nginx.ingress_namespace_name-ns.cluster.local
      validityDuration: 24h
    serviceCertValidityDuration: 24h
  featureFlags:
    enableAsyncProxyServiceMapping: false
    enableEgressPolicy: true
    enableEnvoyActiveHealthChecks: false
    enableIngressBackendPolicy: true
    enableMulticlusterMode: false
    enableRetryPolicy: false
    enableSnapshotCacheMode: false
    enableWASMStats: true
  observability:
    enableDebugServer: true
    osmLogLevel: info
    tracing:
      enable: false
  sidecar:
    configResyncInterval: 0s
    enablePrivilegedInitContainer: false
    logLevel: error
    resources: {}
  traffic:
    enableEgress: true
    enablePermissiveTrafficPolicyMode: false
    inboundExternalAuthorization:
      enable: false
      failureModeAllow: false
      statPrefix: inboundExtAuthz
      timeout: 1s
    inboundPortExclusionList: []
    outboundIPRangeExclusionList: []
    outboundPortExclusionList: []`
shashankram commented 2 years ago

In MeshConfig, subjectAltNames has ingress-nginx.ingress_namespace_name-ns.cluster.local, whereas the IngressBackend has the AuthenticatedPrincipal specified as ingress-nginx.ingress_namespace.cluster.local. This could be a typo, but please ensure the configurations shared are accurate. In this case, they need to match.

Also kindly walkthrough the HTTPS ingress demo https://release-v1-0.docs.openservicemesh.io/docs/demos/ingress_k8s_nginx/#https-ingress-mtls-and-tls verbatim and confirm that it works. If that works, there's likely an issue with the Nginx version you are using or a misconfiguration on your end.

chandan9778 commented 2 years ago

@shashankram yes you are correct that is a typo both are same in my case and also have followed the exact same steps as mentioned in the URL. https://release-v1-0.docs.openservicemesh.io/docs/demos/ingress_k8s_nginx/#https-ingress-mtls-and-tls Still am getting upstream connect error or disconnect/reset before headers. reset reason: connection failure

shashankram commented 2 years ago

@shashankram yes you are correct that is a typo both are same in my case and also have followed the exact same steps as mentioned in the URL. https://release-v1-0.docs.openservicemesh.io/docs/demos/ingress_k8s_nginx/#https-ingress-mtls-and-tls Still am getting upstream connect error or disconnect/reset before headers. reset reason: connection failure

@chandan9778 Just to confirm, are you suggesting following the steps exactly as provided in https://release-v1-0.docs.openservicemesh.io/docs/demos/ingress_k8s_nginx/#https-ingress-mtls-and-tls do not work for you? Could you share how you installed Nginx?

chandan9778 commented 2 years ago

@shashankram yes you are correct that is a typo both are same in my case and also have followed the exact same steps as mentioned in the URL. https://release-v1-0.docs.openservicemesh.io/docs/demos/ingress_k8s_nginx/#https-ingress-mtls-and-tls Still am getting upstream connect error or disconnect/reset before headers. reset reason: connection failure

@chandan9778 Just to confirm, are you suggesting following the steps exactly as provided in https://release-v1-0.docs.openservicemesh.io/docs/demos/ingress_k8s_nginx/#https-ingress-mtls-and-tls do not work for you? Could you share how you installed Nginx?

Yes the above steps did not worked for me. Follwed this yaml to install nginx into my cluster. https://github.com/kubernetes/ingress-nginx/blob/main/deploy/static/provider/cloud/deploy.yaml

shashankram commented 2 years ago

@chandan9778, do you also mind trying the HTTP based ingress workflow and verifying if that works for you? If HTTP works, we can upgrade the configuration to HTTPS and debug what's going on. If HTTP ingress doesn't work, it means there's a misconfiguration or something basic that isn't working.

Also please share the Envoy log from the backend pod taken while the request to the backend fails.

chandan9778 commented 2 years ago

@shashankram tried HTTP based ingress workflow still getting the same [upstream connect error or disconnect/reset before headers. reset reason: connection failure] Also please find envoy logs for your reference.

{"time_to_first_byte":null,"authority":"xyz.dns.com","response_code":503,"upstream_service_time":null,"upstream_cluster":"namespace_name/servicename|8080|local","bytes_sent":91,"upstream_host":"x.x.x.x:8080","protocol":"HTTP/1.1","response_code_details":"upstream_reset_before_response_started{connection_failure}","requested_server_name":null,"user_agent":"Safari/xx.xx","path":"/","duration":0,"bytes_received":0,"response_flags":"UF","request_id":"xxxxxxxxx","method":"GET","start_time":"2022-03-11T10:55:12.990Z","x_forwarded_for":"xx.x.x.x"}
{"start_time":"2022-03-11T10:55:13.360Z","request_id":"xxxxxxxxxxxxxx","duration":0,"x_forwarded_for":"xx.x.x.x","authority":"xyz.dns.com","upstream_service_time":null,"user_agent":"Safari/xxx.36","response_flags":"UF","response_code_details":"upstream_reset_before_response_started{connection_failure}","time_to_first_byte":null,"upstream_host":"x.x.x.x:8080","requested_server_name":null,"method":"GET","protocol":"HTTP/1.1","bytes_received":0,"response_code":503,"upstream_cluster":"nnamespce_name/service_name|8080|local","bytes_sent":91,"path":"/"}
{"path":"/","bytes_sent":91,"response_code":503,"bytes_received":0,"response_flags":"UF","response_code_details":"upstream_reset_before_response_started{connection_failure}","protocol":"HTTP/1.1","upstream_cluster":"namespace_name/service_name|8080|local","upstream_service_time":null,"request_id":"xxxxxxxxxxxxxxxxxxxxxx","x_forwarded_for":"xx.x.x.x","start_time":"2022-03-11T10:55:15.677Z","duration":0,"method":"GET","requested_server_name":null,"time_to_first_byte":null,"upstream_host":"x.x.x.x:8080","authority":"xyz.dns.com","user_agent":"
{"x_forwarded_for":"xx.x.x.x","upstream_cluster":"namespace_name/service_name|8080|local","protocol":"HTTP/1.1","time_to_first_byte":null,"bytes_sent":91,"request_id":"xxxxxxxxxxxx","user_agent":null,"method":"POST","upstream_host":"x.x.x.x:8080","response_code":503,"start_time":"2022-03-11T10:55:54.458Z","requested_server_name":null,"response_code_details":"upstream_reset_before_response_started{connection_failure}","duration":3,"response_flags":"UF","path":"/eventhandler","upstream_service_time":null,"bytes_received":104,"authority":"xyz.dns.com"}
chandan9778 commented 2 years ago

@shashankram Do you have any update on this issue ? Please do let me know if you need any more details from my side. Thanks for your time!

shashankram commented 2 years ago

upstream_reset_before_response_started

The log indicates the connection to the upstream (destination) service was reset. @chandan9778 I think the next step would be to provide a standalone repro so that we can take a look. Please let me know if that's possible.

chandan9778 commented 2 years ago

@shashankram Thanks for your response! But for now providing standalone repo is not possible . Do you have any other way of doing it ?

shashankram commented 2 years ago

@chandan9778, in that case, please share the following info (redact sensitive info but preserve config correctness).

Note: You may have provided some of these, but specifying a more comprehensive list:

  1. k8s service yaml
  2. Nginx HTTP ingress yaml
  3. Request that is failing (e.g. curl http://foo.bar:8080/baz)
  4. OSM IngressBackend yaml
  5. OSM MeshConfig yaml
  6. Complete Envoy sidecar log from backend service pod
  7. Config dump of backend sidecar: osm proxy get config_dump <pod> -n <namespace>
  8. Stats dump of backend sidecar: osm proxy get stats <pod> -n <namespace>
  9. OSM MeshConfig yaml: kubectl get meshconfig osm-mesh-config -n <osm namespace>
  10. Logs of osm-controller pod in osm namespace
keithmattix commented 2 years ago

Root cause was determined to be #4653; closing in favor of that issue