hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
666 stars 318 forks source link

Gateway controller not adding port entries in pod definitions #2833

Open mr-miles opened 1 year ago

mr-miles commented 1 year ago

Community Note


Overview of the Issue

I started to look at the new api gateway feature as it looked like a much more flexible and native approach than the current ingress gateways. However I am having trouble getting it to receive any inbound traffic. I think this is because the gateway controller is not adding the port registrations to the pod spec when it is generated:

My gateway definition looks like this:

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: sdfsfs
  namespace: sfdsdf
spec:
  gatewayClassName: consul
  listeners:
    - allowedRoutes:
        namespaces:
          from: All
      name: https
      port: 443
      protocol: HTTPS
      tls:
        certificateRefs:
          - group: ''
            kind: Secret
            name: sdf-consul-api-gateway-cert
        mode: Terminate

And the GatewayClassConfig looks like this

apiVersion: consul.hashicorp.com/v1alpha1
kind: GatewayClassConfig
metadata:
  name: sdfsfs
  namespace: sfdsdf
spec:
  copyAnnotations: {}
  deployment:
    defaultInstances: 2
    maxInstances: 4
    minInstances: 2
  mapPrivilegedContainerPorts: 30000
  serviceType: NodePort

Here's the pod definition - note only the healthcheck port is shown and not the defined port image

Here is the service definition it produces: image

If I attempt to kubectl port-forward to the api-gateway, it fails with an error about the port not being available:

$ kubectl port-forward pods/sdf-api-gateway-67c68466b9-5tsxd -n consul 31082:31082
Forwarding from 127.0.0.1:31082 -> 31082
Forwarding from [::1]:31082 -> 31082
Handling connection for 31082
E0823 23:22:23.743400   26036 portforward.go:407] an error occurred forwarding 31082 -> 31082: error forwarding port 31082 to pod a597d47f05c487f8bec198132026
a0730627267d105f8874c7a771b5f3340111, uid : failed to execute portforward in network namespace "/var/run/netns/cni-b8794c99-2411-b88d-dd43-2394e024ea9a": fail
ed to connect to localhost:31082 inside namespace "a597d47f05c487f8bec198132026a0730627267d105f8874c7a771b5f3340111", IPv4: dial tcp4 127.0.0.1:31082: connect
: connection refused IPv6 dial tcp6: address localhost: no suitable address found
E0823 23:22:23.744959   26036 portforward.go:233] lost connection to pod
Handling connection for 31082
E0823 23:22:23.746001   26036 portforward.go:345] error creating error stream for port 31082 -> 31082: EOF

$ kubectl port-forward pods/ims-api-gateway-67c68466b9-5tsxd -n consul 31082:443
Forwarding from 127.0.0.1:31082 -> 443
Forwarding from [::1]:31082 -> 443
Handling connection for 31082
Handling connection for 31082
E0823 23:28:35.072185   14752 portforward.go:407] an error occurred forwarding 31082 -> 443: error forwarding port 443 to pod a597d47f05c487f8bec198132026a073
0627267d105f8874c7a771b5f3340111, uid : failed to execute portforward in network namespace "/var/run/netns/cni-b8794c99-2411-b88d-dd43-2394e024ea9a": failed t
o connect to localhost:443 inside namespace "a597d47f05c487f8bec198132026a0730627267d105f8874c7a771b5f3340111", IPv4: dial tcp4 127.0.0.1:443: connect: connec
tion refused IPv6 dial tcp6: address localhost: no suitable address found
E0823 23:28:35.074309   14752 portforward.go:233] lost connection to pod
Handling connection for 31082
E0823 23:28:35.074371   14752 portforward.go:345] error creating error stream for port 31082 -> 443: EOF

This in turn is preventing me from hooking it up to the loadbalancer.

I can shell into the api-gateway pod itself and it is receiving envoy configuration which looks to be correct. I can also see the various transition messages as the gateway config and routes are accepted and bound together - so I don't think it is misconfigured on that side. I am comparing with a working nodeport scheme for an ingress-gateway, and it is only the port entries which stand out as different.

Am I missing some step to get the ports into the pod definition? Or is some additional configuration needed to join it all up?

Thanks for your help

Environment details

consul-k8s 1.2.1 EKS 1.26.7

missylbytes commented 1 year ago

So the gateway controller will not add the port entries on the pod definition. But the gateway/those ports should still work. Can you verify gatewayClassName defined on the gateway matches the gatewayClassConfig you created? They don't match in the documents you linked, but didn't know if you scrubbed something.

mr-miles commented 1 year ago

hmm, maybe that's a red herring then. The name+namespace you suggested match up, and it it creating the gateway pod and a running envoy instance.

But trawling through the server logs, I can see entries like:

failed to generate all xDS resources from the snapshot: failed to generate xDS resources for \"type.googleapis.com/envoy.config.cluster.v3.Cluster\": no discovery chain for upstream \"exampleservice\""

and the envoy config dump contains this error message:

"dynamic_listeners": [
    {
     "name": "http:xx.xx.xx.xx:443",
     "error_state": {
      "failed_configuration": {
       "@type": "type.googleapis.com/envoy.config.listener.v3.Listener",
       "name": "http:xx.xx.xx.xx:443",
       "address": {
        "socket_address": {
         "address": "xx.xx.xx.xx",
         "port_value": 443
        }
       },
       "traffic_direction": "OUTBOUND"
      },
      "last_update_attempt": "2023-08-28T15:38:13.747Z",
      "details": "error adding listener 'xx.xx.xx.xx:443': no filter chains specified"
     }
    },
    {
     "error_state": {
      "failed_configuration": {
       "@type": "type.googleapis.com/envoy.config.listener.v3.Listener",
       "name": "http:xx.xx.xx.xx:443",
       "address": {
        "socket_address": {
         "address": "xx.xx.xx.xx",
         "port_value": 443
        }
       },
       "traffic_direction": "OUTBOUND"
      },
      "details": "error adding listener 'xx.xx.xx.xx:443': no filter chains specified"
     }
    }
   ]

and I can see the http routes are missing from the envoy config. So I am guessing that consul can't create the routes to send down to envoy.

However, the HTTPRoute and Gateway objects all seem fine in their status entries:

HTTPRoute:

status:
  parents:
    - conditions:
        - lastTransitionTime: '2023-08-28T07:40:12Z'
          message: resolved backend references
          observedGeneration: 2
          reason: ResolvedRefs
          status: 'True'
          type: ResolvedRefs
        - lastTransitionTime: '2023-08-15T10:26:33Z'
          message: route accepted
          observedGeneration: 2
          reason: Accepted
          status: 'True'
          type: Accepted
        - lastTransitionTime: '2023-08-15T10:26:44Z'
          message: route synced to Consul
          observedGeneration: 2
          reason: Synced
          status: 'True'
          type: Synced
        - lastTransitionTime: '2023-08-28T07:40:22Z'
          message: route is valid
          observedGeneration: 2
          reason: Accepted
          status: 'True'
          type: ConsulAccepted

Gateway:

status:
  addresses:
    - type: IPAddress
      value: xx.xx.xx.xx
    - type: IPAddress
      value: yy.yy.yy.yy
  conditions:
    - lastTransitionTime: '2023-08-14T21:08:19Z'
      message: gateway accepted
      observedGeneration: 1
      reason: Accepted
      status: 'True'
      type: Accepted
    - lastTransitionTime: '2023-08-15T08:37:40Z'
      message: gateway programmed
      observedGeneration: 1
      reason: Programmed
      status: 'True'
      type: Programmed
    - lastTransitionTime: '2023-08-14T21:08:22Z'
      message: gateway synced to Consul
      observedGeneration: 1
      reason: Synced
      status: 'True'
      type: Synced
    - lastTransitionTime: '2023-08-14T21:09:28Z'
      message: gateway is valid
      observedGeneration: 1
      reason: Accepted
      status: 'True'
      type: ConsulAccepted
  listeners:
    - attachedRoutes: 45
      conditions:
        - lastTransitionTime: '2023-08-28T15:38:23Z'
          message: listener accepted
          observedGeneration: 1
          reason: Accepted
          status: 'True'
          type: Accepted
        - lastTransitionTime: '2023-08-28T15:38:23Z'
          message: listener programmed
          observedGeneration: 1
          reason: Programmed
          status: 'True'
          type: Programmed
        - lastTransitionTime: '2023-08-28T15:38:23Z'
          message: listener has no conflicts
          observedGeneration: 1
          reason: NoConflicts
          status: 'False'
          type: Conflicted
        - lastTransitionTime: '2023-08-28T15:38:23Z'
          message: resolved certificate references
          observedGeneration: 1
          reason: ResolvedRefs
          status: 'True'
          type: ResolvedRefs
      name: https
      supportedKinds:
        - group: gateway.networking.k8s.io
          kind: HTTPRoute

The HTTPRoutes I have defined look like this:

apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: route-api-apps-<service>
  namespace: ns
spec:
  hostnames:
    - host1.company.com
  parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: <gateway-name>
      namespace: consul
  rules:
    - backendRefs:
        - group: ''
          kind: Service
          name: <service>
          weight: 1
      filters:
        - requestHeaderModifier:
            add:
              - name: X-Path
                value: /api/<service>
          type: RequestHeaderModifier
      matches:
        - path:
            type: PathPrefix
            value: /api/<service>

Is there more logging I can get out of consul to see what's mis-matched? The definitions seem to have passed validation.

missylbytes commented 1 year ago

So

failed to generate all xDS resources from the snapshot: failed to generate xDS resources for \"type.googleapis.com/envoy.config.cluster.v3.Cluster\": no discovery chain for upstream \"exampleservice\""

is probably just an overzealous logging statement.

I was able to get your original configs working locally with the following settings and a basic helm install. I don't see anything egregious in what you posted recently, but can you try with these and see if you are still seeing the issue?

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: sdfsfs
spec:
  #gatewayClassName: consul
  gatewayClassName: sdfsfs
  listeners:
    - allowedRoutes:
        namespaces:
          from: All
      name: http
      port: 80
      protocol: HTTP
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: GatewayClass
metadata:
  name: sdfsfs
spec:
  controllerName: "consul.hashicorp.com/gateway-controller"
  parametersRef:
    group: consul.hashicorp.com
    kind: GatewayClassConfig
    name: sdfsfs
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: GatewayClassConfig
metadata:
  name: sdfsfs
spec:
  copyAnnotations: {}
  deployment:
    defaultInstances: 2
    maxInstances: 4
    minInstances: 2
  mapPrivilegedContainerPorts: 30000
  serviceType: NodePort
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: http-route
spec:
  parentRefs:
  - name: sdfsfs
  rules:
  - backendRefs:
    - kind: Service
      name: frontend
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: frontend
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: frontend
spec:
  protocol: http
---
apiVersion: v1
kind: Service
metadata:
  name: frontend
spec:
  selector:
    app: frontend
  ports:
    - port: 8080
      # targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  labels:
    app: frontend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
      annotations:
        "consul.hashicorp.com/connect-inject": "true"
    spec:
      serviceAccountName: frontend
      containers:
        - name: frontend
          image: hashicorp/http-echo:latest
          args: ["-listen=:8080", "-text=hello"]
          ports:
            - containerPort: 8080

As far as logs go, you can try setting the global.logLevel to debug. The other place to check the logs would be in the connect-inject container or the gateway-pod. Let me know what you find.

mr-miles commented 1 year ago

Thanks for taking the time to go through this!

The one difference I can spot is that my routes and services are in one (the same) namespace, and the gateway is in a second. But I will see if I can build up to the full setup…

On Wed, Aug 30, 2023 at 9:24 PM Melisa Griffin @.***> wrote:

So

failed to generate all xDS resources from the snapshot: failed to generate xDS resources for \"type.googleapis.com/envoy.config.cluster.v3.Cluster\ http://type.googleapis.com/envoy.config.cluster.v3.Cluster%5C": no discovery chain for upstream \"exampleservice\""

is probably just an overzealous logging statement.

I was able to get your original configs working locally with the following settings and a basic helm install. I don't see anything egregious in what you posted recently, but can you try with these and see if you are still seeing the issue?

apiVersion: gateway.networking.k8s.io/v1beta1 kind: Gateway metadata: name: sdfsfs spec:

gatewayClassName: consul

gatewayClassName: sdfsfs listeners:

  • allowedRoutes: namespaces: from: All name: http port: 80 protocol: HTTP

    apiVersion: gateway.networking.k8s.io/v1beta1 kind: GatewayClass metadata: name: sdfsfs spec: controllerName: "consul.hashicorp.com/gateway-controller" parametersRef: group: consul.hashicorp.com kind: GatewayClassConfig name: sdfsfs

    apiVersion: consul.hashicorp.com/v1alpha1 kind: GatewayClassConfig metadata: name: sdfsfs spec: copyAnnotations: {} deployment: defaultInstances: 2 maxInstances: 4 minInstances: 2 mapPrivilegedContainerPorts: 30000 serviceType: NodePort

    apiVersion: gateway.networking.k8s.io/v1beta1 kind: HTTPRoute metadata: name: http-route spec: parentRefs:

    • name: sdfsfs rules:
    • backendRefs:
  • kind: Service name: frontend

    apiVersion: v1 kind: ServiceAccount metadata: name: frontend

    apiVersion: consul.hashicorp.com/v1alpha1 kind: ServiceDefaults metadata: name: frontend spec: protocol: http

    apiVersion: v1 kind: Service metadata: name: frontend spec: selector: app: frontend ports:

  • port: 8080

    targetPort: 8080


    apiVersion: apps/v1 kind: Deployment metadata: name: frontend labels: app: frontend spec: replicas: 1 selector: matchLabels: app: frontend template: metadata: labels: app: frontend annotations: "consul.hashicorp.com/connect-inject": "true" spec: serviceAccountName: frontend containers:

    • name: frontend image: hashicorp/http-echo:latest args: ["-listen=:8080", "-text=hello"] ports:
      • containerPort: 8080

As far as logs go, you can try setting the global.logLevel https://github.com/hashicorp/consul-k8s/blob/main/charts/consul/values.yaml#L19 to debug. The other place to check the logs would be in the connect-inject container or the gateway-pod. Let me know what you find.

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/consul-k8s/issues/2833#issuecomment-1699788815, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQD4BZUNLJVBXVB5XO4ILXX6OQXANCNFSM6AAAAAA34ESVXM . You are receiving this because you authored the thread.Message ID: @.***>