hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
667 stars 316 forks source link

consul prepared_query isn't working as intended for services on kubernetes pods #1387

Open donghoang89 opened 2 years ago

donghoang89 commented 2 years ago

Overview of the Issue

We have a consul cluster running with the following prepared_query:

{
  "Name": "",
  "Session": "",
  "Token": "",
  "Template": {
    "Type": "name_prefix_match"
  },
  "Service": {
    "Service": "${name.full}",
    "Failover": {
      "NearestN": 1,
      "Datacenters": []
    },
    "IgnoreCheckIDs": [],
    "OnlyPassing": false,
    "Near": "_ip",
    "Tags": [],
    "NodeMeta": {},
    "ServiceMeta": {},
    "Connect": false
  },
  "DNS": {
    "TTL": "10s"
  }
}

We then deployed consul-k8s chart to our kubernetes cluster using the following command: helm install -f deploy.yaml.template consul ./consul-0.34.1.tgz

The content of deploy.yaml.template is as followed:

client:
  enabled: true
  dnsPolicy: ClusterFirstWithHostNet
  hostNetwork: true
  join:
    - [consul_server_1]
    - [consul_server_2]
    - [consul_server_3]
  hosts:
    - [consul_server_1]
    - [consul_server_2]
    - [consul_server_3]
  annotations: "\"consul.hashicorp.com/connect-service-upstreams\": \"prepared_query\""
externalServers:
  enabled: true
  hosts:
    - [consul_server_1]
    - [consul_server_2]
    - [consul_server_3]
syncCatalog:
  enabled: true
  default: true
  toK8S: true
  resources:
    requests:
      memory: "100Mi"
      cpu: "100m"
    limits:
      memory: "100Mi"
      cpu: "100m"

When we run nslookup against a registered service that runs on our vms, consul was able to consistently return the service IPs sorted by geo-proximity to our query-sending server (The query-sending server is from us-west).

localhost:~# nslookup database.query.consul
Server:     [consul_sever_1]
Address:    [consul_sever_1]#53

Name:   database.query.consul
Address: [database_US-WEST_server_1]
Name:   database.query.consul
Address: [database_US-WEST_server_2]
Name:   database.query.consul
Address: [database_US-EAST_server_1]

localhost:~# nslookup database.query.consul
Server:     [consul_sever_1]
Address:    [consul_sever_1]#53

Name:   database.query.consul
Address: [database_US-WEST_server_1]
Name:   database.query.consul
Address: [database_US-WEST_server_2]
Name:   database.query.consul
Address: [database_US-EAST_server_1]

However, when we run nslookup against a registered service that run on kubernetes pods, consul would return the service IPs to us in a shuffled manner:

localhost:~# nslookup database-kube.query.consul
Server:     [consul_sever_1]
Address:    [consul_sever_1]#53

Name:   database-kube.query.consul
Address: [database-kube_US-WEST_server_1]
Name:   database-kube.query.consul
Address: [database-kube_US-EAST_server_1]
Name:   database-kube.query.consul
Address: [database-kube_US-WEST_server_2]

localhost:~# nslookup database-kube.query.consul
Server:     [consul_sever_1]
Address:    [consul_sever_1]#53

Name:   database-kube.query.consul
Address: [database-kube_US-EAST_server_1]
Name:   database-kube.query.consul
Address: [database-kube_US-WEST_server_2]
Name:   database-kube.query.consul
Address: [database-kube_US-WEST_server_1]

Reproduction Steps

Run nslookup against any registered service that run on kubernetes pods.

Expected behavior

Consul would return the service IPs sorted by geo-proximity to the querying server.

Actual behavior

Consul returns the service IPs in a shuffled manner.

Environment details

Additionally, please provide details regarding the Kubernetes Infrastructure, as shown below:

donghoang89 commented 2 years ago

Anyone?

donghoang89 commented 2 years ago

When running curl https://127.0.0.1:8500/v1/catalog/service/db-service-release | jq, I am seeing the following output excerpt (the full output file is attached below)

[
  {
    "ID": "",
    "Node": "k8s-sync",
    "Address": "127.0.0.1",
    "Datacenter": "dc1",
    "TaggedAddresses": null,
    "NodeMeta": {
      "external-source": "kubernetes"
    },
    "ServiceKind": "",
    "ServiceID": "database-kube-2b5c960fe63a",
    "ServiceName": "database-kube",
    "ServiceTags": [
    ],
    "ServiceAddress": "[database-kube_US-WEST_server_1]",
    "ServiceWeights": {
      "Passing": 1,
      "Warning": 1
    },
    "ServiceMeta": {
      "external-source": "kubernetes",
      "port-database": "[PORT]"
    },
    "ServicePort": [PORT],
    "ServiceSocketPath": "",
    "ServiceEnableTagOverride": false,
    "ServiceProxy": {
      "Mode": "",
      "MeshGateway": {},
      "Expose": {}
    },
    "ServiceConnect": {},
    "CreateIndex": 183717217,
    "ModifyIndex": 183717217
  }
...

I notice that the Address field is shown to have the value of 127.0.0.1. The same is true for all 6 service pods (ServiceAddresses are different, but Addresses are all the same). I believe this is the reason nslookup have been returning the service IPs in a shuffled manner.

How do I configure my consul-k8s client so that the service pods would show the kubernetes node IPs, instead of 127.0.0.1, as their Addresses?

Output file: database-kube-output.txt

donghoang89 commented 2 years ago

Any guidance would be greatly appreciated.

donghoang89 commented 2 years ago

I can confirm that we are running consul client as a daemonSet, not a sidecar.

donghoang89 commented 2 years ago

Is there no resolution for this issue?