hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
670 stars 324 forks source link

consul not injecting into nginx-controller wrong agent address #945

Closed chebykinn closed 2 years ago

chebykinn commented 2 years ago

Community Note


Overview of the Issue

I'm trying to integrate with nginx ingress via transparent proxy and consul-connect-inject-init sidecar is unable to start with an error:

2022-01-01T20:40:01.577Z [ERROR] Unable to get Agent services: error="Get "http://192.168.10.22:8500/v1/agent/services?filter=Meta%5B%22pod-name%22%5D+%3D%3D+%22ingress-nginx-controller-85845f5569-8qj8n%22+and+Meta%5B%22k8s-namespace%22%5D+%3D%3D+%22ingress-nginx%22": dial tcp 192.168.10.22:8500: connect: connection refused"

It looks like it tries to use host IP instead of pod IP when connecting to consul, but I don't understand why, it injects just fine into my app.

Reproduction Steps

  1. Consul values:
    global:
    name: consul
    datacenter: dc1
    enabled: true
    server:
    replicas: 1
    securityContext:
    runAsNonRoot: false
    runAsGroup: 0
    runAsUser: 0
    fsGroup: 0
    ui:
    ingress:
    enabled: true
    ingressClassName: nginx
    hosts:
      - host: consul.local
        paths:
          - /
    client:
    enabled: true
    connectInject:
    enabled: true
    controller:
    enabled: true
  2. Ingress values
    controller:
    nodeSelector:
    node-role.kubernetes.io/ingress: "true"
    tolerations:
    - key: "node-role.kubernetes.io/ingress"
      operator: "Exists"
    podAnnotations:
    consul.hashicorp.com/connect-inject: "true"
    consul.hashicorp.com/transparent-proxy: "true"
    consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: "80,8000,9000,8443"
    consul.hashicorp.com/transparent-proxy-exclude-outbound-cidrs: "10.233.0.1/32"
    service:
    type: NodePort
  3. Apply these on a clean bare metal cluster. (I'm using kubespray)

Logs

2022-01-01T20:40:01.577Z [ERROR] Unable to get Agent services: error="Get "http://192.168.10.22:8500/v1/agent/services?filter=Meta%5B%22pod-name%22%5D+%3D%3D+%22ingress-nginx-controller-85845f5569-8qj8n%22+and+Meta%5B%22k8s-namespace%22%5D+%3D%3D+%22ingress-nginx%22": dial tcp 192.168.10.22:8500: connect: connection refused"

Expected behavior

Environment details

k8s:

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:41:01Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/arm64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6", GitCommit:"d921bc6d1810da51177fbd0ed61dc811c5228097", GitTreeState:"clean", BuildDate:"2021-10-27T17:44:26Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}

CNI: calico Consul: v1.11.1

Additional Context

I've actually got this working before, but I can't pinpoint what changed when I recreated kubernetes cluster.

chebykinn commented 2 years ago

Hm, it looks like init container uses wrong HOST_IP, I have 3 nodes kubernetes configuration:

NAME        STATUS   ROLES                  AGE     VERSION
ingress-1   Ready    ingress                3d18h   v1.21.6
master-1    Ready    control-plane,master   3d18h   v1.21.6
worker-1    Ready    worker                 3d18h   v1.21.6

When injecting consul into custom apps, connect-init is using worker-1 IP address, which is correct, but when ingress-nginx is starting, HOST_IP is ingress-1's address, but there is no consul agent listening on that node

chebykinn commented 2 years ago

So, I've figured out my problem, Consul needs to install each client pods to each node and it uses DaemonSet to achieve that (as stated here https://www.consul.io/docs/k8s#client-agents). My ingress-1 node configuration has taint which excludes it from DaemonSet:

node_labels:
     node-role.kubernetes.io/ingress: "true"
node_taints:
     - "node-role.kubernetes.io/ingress=:NoSchedule"

In order to allow consul to schedule a pod on that node I've should've added tolerations like I had in my nginx config:

client:
  enabled: true
  tolerations: |
    - key: "node-role.kubernetes.io/ingress"
      operator: "Exists"