nginxinc / kubernetes-ingress

NGINX and NGINX Plus Ingress Controllers for Kubernetes
https://docs.nginx.com/nginx-ingress-controller
Apache License 2.0
4.65k stars 1.96k forks source link

Ingress disabled for non-mesh traffic once integrated #3039

Closed darkn3rd closed 1 year ago

darkn3rd commented 2 years ago

Describe the bug

After deploying nginx-service with integrated NGINX+ ingress controller, VirtualServer configured for services that that are not in the mesh will return 502 bad gateway. This is bad because I want to keep some solutions OUT OF THE MESH so they cannot access protected services.

The NSM is configured to have mTLS set to strict mode to drop traffic from outside of the service mesh as the cluster has both services that are part of the mesh and services that are not part of the mesh.

To Reproduce Steps to reproduce the behavior:

I used helmfile to encapsulate and configure Helm charts.

  1. Install o11y
    URLS=(https://docs.nginx.com/nginx-service-mesh/examples/{prometheus,grafana,otel-collector,jaeger}.yaml)
    for URL in ${URLS[*]}; do curl -sOL $URL; done
    for FILE in {prometheus,grafana,otel-collector,jaeger}.yaml; do kubectl apply -f $FILE; done
  2. Install NSM

    cat << EOF > nsm.yaml
    repositories:
     # https://artifacthub.io/packages/helm/nginx/nginx-service-mesh
     - name: nginx-stable
       url: https://helm.nginx.com/stable
    
    releases:
     - name: nsm
       namespace: nginx-mesh
       chart: nginx-stable/nginx-service-mesh
       values:
         - prometheusAddress: prometheus.nsm-monitoring.svc:9090
           telemetry:
             exporters:
               otlp:
                 host: otel-collector.nsm-monitoring.svc
                 port: 4317
             samplerRatio: 1
           tracing: null
           mtls:
             mode: strict
           autoInjection:
             disable: true
    EOF
    helmfile -f nsm.yaml apply
  3. Install NGINX+ IC

    # assume nginx-plus images are in local accessible GCR
    cat << EOF > nginx_ic.yaml
    repositories:
     # https://artifacthub.io/packages/helm/nginx/nginx-ingress
     - name: nginx-stable
       url: https://helm.nginx.com/stable
    
    releases:
     # NOTE: tutorial online uses 'nginx-ingress' for namespace
     - name: nginx-ingress
       namespace: kube-addons
       chart: nginx-stable/nginx-ingress
       version: 0.14.0
       values:
         - controller:
             nginxplus: true
             image:
               repository: gcr.io/{{ requiredEnv "GCR_PROJECT_ID" }}/nginx-plus-ingress
               tag: 2.3.0
             # NGINX Configmap
             config:
               entries:
                 ssl-redirect: "True"
                 http2: "True"
             ingressClass: nginx
             # NGINX IC CRDs
             enableCustomResources: true
             enableCertManager: true
             enableExternalDNS: true
             # Prometheus must be installed
             enableLatencyMetrics: true
           nginxServiceMesh:
             enable: true
             enableEgress: true
    EOF
    helmfile -f nginx_ic.yaml apply
  4. Install External DNS and Cert-Manager NOTE: For real DNS + ACME DNS01 challenge to work, services must have access to r/w DNS (route53, Cloud DNS, Azure DNS, etc). The snippet below is oriented to GKE with GCR + Cloud DNS

    export DNS_PROJECT_ID="<your-cloud-dns-zone-project>"
    export DNS_SA_EMAIL="<your-gsa-with-access-to-cloud-dns-zone>"
    export DNS_DOMAIN="example.com" # replace me
    
    cat << EOF > kube_addons.yaml
    repositories:
     # https://artifacthub.io/packages/helm/cert-manager/cert-manager
     - name: jetstack
       url: https://charts.jetstack.io
     # https://artifacthub.io/packages/helm/bitnami/external-dns
     - name: bitnami
       url: https://charts.bitnami.com/bitnami
    
    releases:
     - name: external-dns
       namespace: kube-addons
       chart: bitnami/external-dns
       version: 6.8.1
       values:
         - provider: google
           google:
             zoneVisibility: public
             project: {{ env "DNS_PROJECT_ID" }}
           sources:
             - crd
             - service
             - ingress
           # use with NGINX VirtualServer CRD
           crd:
             create: false
             apiversion: externaldns.nginx.org/v1
             kind: DNSEndpoint
           serviceAccount:
             annotations:
               # google workgroup identity annotation
               iam.gke.io/gcp-service-account: {{ requiredEnv "DNS_SA_EMAIL" }}
           nodeSelector:
             # deploy on nodes that support workgroup identity
             iam.gke.io/gke-metadata-server-enabled: "true"
           logLevel: {{ env "EXTERNALDNS_LOG_LEVEL" | default "debug" }}
           domainFilters:
             - {{ requiredEnv "DNS_DOMAIN" }}
           txtOwnerId: external-dns
           rbac:
             create: true
             apiVersion: v1
           policy: upsert-only
    
     - name: cert-manager
       namespace: kube-addons
       chart: jetstack/cert-manager
       version: 1.9.1
       values:
         - installCRDs: true
           extraArgs:
             - --cluster-resource-namespace=kube-addons
           global:
             logLevel: 2
           serviceAccount:
             annotations:
               # google workgroup identity annotation
               iam.gke.io/gcp-service-account: {{ requiredEnv "DNS_SA_EMAIL" }}
           nodeSelector:
             # deploy on nodes that support workgroup identity
             iam.gke.io/gke-metadata-server-enabled: "true"
    EOF
    
    cat << EOF > issuers.yaml
    repositories:
     # https://artifacthub.io/packages/helm/itscontained/raw
     - name: itscontained
       url: https://charts.itscontained.io
    
    releases:
     - name: cert-manager-issuers
       chart: itscontained/raw
       namespace: kube-addons
       version:  0.2.5
       disableValidation: true
       values:
         - resources:
             - apiVersion: cert-manager.io/v1
               kind: ClusterIssuer
               metadata:
                 name: letsencrypt-staging
               spec:
                 acme:
                   server: https://acme-staging-v02.api.letsencrypt.org/directory
                   email: {{ requiredEnv "ACME_ISSUER_EMAIL" }}
                   privateKeySecretRef:
                     name: letsencrypt-staging
                   solvers:
                     - dns01:
                         cloudDNS:
                           project: {{ env "DNS_PROJECT_ID" }}
    
             - apiVersion: cert-manager.io/v1
               kind: ClusterIssuer
               metadata:
                 name: letsencrypt-prod
               spec:
                 acme:
                   server: https://acme-v02.api.letsencrypt.org/directory
                   email: {{ requiredEnv "ACME_ISSUER_EMAIL" }}
                   privateKeySecretRef:
                     name: letsencrypt-prod
                   solvers:
                     - dns01:
                         cloudDNS:
                           project: {{ env "DNS_PROJECT_ID" }}
    EOF
    
    helmfile -f kube_addons.yaml apply
    helmfile -f issuers.yaml apply
  5. Install Ratel outside of mesh

    cat << EOF > ratel.yaml
    repositories:
     # https://artifacthub.io/packages/helm/itscontained/raw
     - name: itscontained
       url: https://charts.itscontained.io
    
    releases:
     - name: ratel
       chart: itscontained/raw
       namespace: ratel
       version:  0.2.5
       disableValidation: true
       values:
         - resources:
             - apiVersion: apps/v1
               kind: Deployment
               metadata:
                 name: dgraph-ratel
               spec:
                 selector:
                   matchLabels:
                     app: dgraph
                     component: ratel
                 replicas: 1
                 template:
                   metadata:
                     labels:
                       app: dgraph
                       component: ratel
                   spec:
                     containers:
                       - name: dgraph-ratel
                         image: docker.io/dgraph/ratel:v21.03.2
                         imagePullPolicy:
                         command:
                           - dgraph-ratel
                         ports:
                           - name: http-ratel
                             containerPort: 8000
    
             - apiVersion: v1
               kind: Service
               metadata:
                 name: dgraph-ratel
                 labels:
                   app: dgraph
                   component: ratel
               spec:
                 type: ClusterIP
                 ports:
                   - port: 80
                     targetPort: 8000
                     name: http-ratel
                 selector:
                   app: dgraph
                   component: ratel
    EOF
    
    cat << EOF > ratel_vs.yaml
    repositories:
     # https://artifacthub.io/packages/helm/itscontained/raw
     - name: itscontained
       url: https://charts.itscontained.io
    
    releases:
     - name: ratel-virtualserver
       chart: itscontained/raw
       namespace: ratel
       version:  0.2.5
       disableValidation: true
       values:
         - resources:
             - apiVersion: k8s.nginx.org/v1
               kind: VirtualServer
               metadata:
                 name: dgraph-http
               spec:
                 host: ratel.{{ requiredEnv "DNS_DOMAIN" }}
                 tls:
                   secret: tls-secret
                   cert-manager:
                     cluster-issuer: {{ requiredEnv "ACME_ISSUER_NAME" }}
                 externalDNS:
                   enable: true
                 upstreams:
                   - name: ratel
                     service: dgraph-ratel
                     port: 80
                 routes:
                   - path: /
                     action:
                       pass: ratel
    EOF
    
    helmfile -f ratel.yaml apply
    helmfile -f ratel_vs.yaml apply
  6. Access the website, for example:
    curl https://ratel.$DNS_DOMAIN

Expected behavior

I expected that the gateway (NGINX+ IC) would route traffic to back-end services that are not meshed in addition to services that are meshed. The reason why this is important, it because ratel is only a client application, and should it ever be compromised, it should NOT be able to reach the private database cluster or any other services on the mesh.

Actual behavior I globally search/replace my registered domain for example.com.

2022/09/15 03:33:42 [error] 47#47: *378 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:42 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:42 [error] 47#47: *380 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET /apple-touch-icon-precomposed.png HTTP/2.0", upstream: "https://10.104.0.40:8000/apple-touch-icon-precomposed.png", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:42 +0000] "GET /apple-touch-icon-precomposed.png HTTP/2.0" 502 157 "-" "Safari/15608.4.9.1.3 CFNetwork/1121.1.2 Darwin/19.2.0 (x86_64)" "-"
2022/09/15 03:33:42 [error] 47#47: *380 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET /apple-touch-icon.png HTTP/2.0", upstream: "https://10.104.0.40:8000/apple-touch-icon.png", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:42 +0000] "GET /apple-touch-icon.png HTTP/2.0" 502 157 "-" "Safari/15608.4.9.1.3 CFNetwork/1121.1.2 Darwin/19.2.0 (x86_64)" "-"
2022/09/15 03:33:45 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:45 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:47 [error] 47#47: *378 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:47 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:47 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:47 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:51 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:51 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:52 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"
135.180.100.148 - - [15/Sep/2022:03:33:52 +0000] "GET / HTTP/2.0" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15" "-"
2022/09/15 03:33:55 [error] 47#47: *383 SSL_do_handshake() failed (SSL: error:1408F10B:SSL routines:ssl3_get_record:wrong version number) while SSL handshaking to upstream, client: 135.180.100.148, server: ratel.example.com, request: "GET / HTTP/2.0", upstream: "https://10.104.0.40:8000/", host: "ratel.example.com"

Your environment

Additional context

I can provide scripts to provision Cloud DNS, GKE, GCR, and configure access with Google Service Accounts and Workload Identity using gcloud and gsutil if needed.

I also deployed a backend distributed graph database Dgraph, but since that was suppose to be in the mesh and works fine, I didn't include it here. The Ratel is a client only to bootstrap the client, so it shouldn't have access to the strict service mesh.

github-actions[bot] commented 2 years ago

Hi @darkn3rd thanks for reporting!

Be sure to check out the docs while you wait for a human to take a look at this :slightly_smiling_face:

Cheers!

darkn3rd commented 2 years ago

I found out that this is the expected behavior. I would like to convert this to an enhancement request for the following: