hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.4k stars 4.43k forks source link

Consul service mesh across multiple clusters with kafka #14125

Closed codex70 closed 2 years ago

codex70 commented 2 years ago

Overview of the Issue

I am unable to connect to kafka from a second data center using a mesh gateway.

Reproduction Steps

  1. I install consul using helm in 2 separate kubernetes clusters. The 2 clusters are federated with a separate datacenter in each cluster.
  2. Service mesh gateway appears to be working and testservice/testclient works as expected for 'hello world' example.
  3. I install kafka in the first cluster using helm templates. The following pod annotations are used:
    consul.hashicorp.com/connect-service: "kafka",
    consul.hashicorp.com/connect-inject: "true",
    consul.hashicorp.com/connect-service-port: "9094",
    consul.hashicorp.com/transparent-proxy: "true",
    consul.hashicorp.com/enable-metrics: "false",
    consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: "9094",
    consul.hashicorp.com/kubernetes-service: "kafka",
  4. Kafka is setup with a load balancer on the network accessible from both clusters.
  5. I install the same client in both the first and second cluster with the following annotations:
      consul.hashicorp.com/connect-inject: "true"
      consul.hashicorp.com/transparent-proxy: "true"
      consul.hashicorp.com/service-tags: "spring-boot"
      consul.hashicorp.com/service-metrics-path: '/actuator/prometheus'
      consul.hashicorp.com/connect-service-upstreams: 'kafka:9094:dc1'
  6. On both client I attempt to connect using the following connection strings:
            "kafka": {
              "bootstrap-servers": "kafka.service.dc1.consul:9094",
              "producer": {
                    "bootstrap-servers": "kafka.service.dc1.consul:9094"
              },
  7. On the first cluster, it works as expected.
  8. On the second cluster, I get the following error:
    Cancelled in-flight API_VERSIONS request with correlation id 0 due to node -1 being disconnected
  9. On the second cluster, if I replace the service name with the IP address of the service load balancer, again it works as expected.
  10. If I replace the service name in the connection string (for either cluster) with localhost again, I get time out errors, however slightly different: Connection to node -1 (localhost/127.0.0.1:9094) could not be established. Broker may not be available.

Consul info for both Client and Server

Operating system and Environment details

Kubernetes clusters running in cloud environment

Log Fragments

codex70 commented 2 years ago

OK, finally figured out a solution after way too long playing with this. For anyone else wanting to deploy kafka inside the service mesh, the solution is to add the following annotaitons:

podAnnotations: {
  consul.hashicorp.com/connect-service: "kafka,kafka-headless",
  consul.hashicorp.com/connect-inject: "true",
  consul.hashicorp.com/connect-service-port: "9094,9093",
  consul.hashicorp.com/transparent-proxy: "false",
  consul.hashicorp.com/enable-metrics: "false",
}

It is also necessary to add the following annotation to the external access service:

      consul.hashicorp.com/service-ignore : "true"

This isn't currently possible with the helm templates for kafka, but the modification is simple enough.

Services then require the following annotations to connect to kafka:

      consul.hashicorp.com/connect-inject: "true"
      consul.hashicorp.com/transparent-proxy: "false"
      consul.hashicorp.com/connect-service-upstreams: "kafka:9094:dc1"

and connecting via localhost to kafka:

            "kafka": {
              "bootstrap-servers": "localhost:9094",
              "producer": {
                "bootstrap-servers": "localhost:9094"
              }
            }
david-yu commented 2 years ago

Thank you for the insight @codex70 I'll pass this along to some folks that are also looking to do the same thing.

codex70 commented 2 years ago

@david-yu, just to let you know that the bitnami helm charts I'm working with to deploy kafka are open source, so I've made the necessary changes and this has now been added to the latest release.

david-yu commented 2 years ago

@codex70 do you have perhaps a gist we can follow to see how you're setting up Kafka with Consul K8s? I think something like this would be beneficial to share with the community.

codex70 commented 2 years ago

Hi @david-yu,

The setup for kafka uses the bitnami helm charts, you will need version 18.2.0 or later. I have this working with 3 nodes. Here's a copy of the relevant values file contents.

replicaCount: 3

# SEE DEFAULT VALUES FILE: https://github.com/bitnami/charts/blob/master/bitnami/kafka/values.yaml
podAnnotations: {
  consul.hashicorp.com/connect-service: "kafka,kafka-headless",
  consul.hashicorp.com/connect-inject: "true",
  consul.hashicorp.com/connect-service-port: "9094,9093",
  consul.hashicorp.com/transparent-proxy: "false",
  consul.hashicorp.com/enable-metrics: "false",
}

serviceAccount:
  create: true
rbac:
  create: true

externalAccess:
  enabled: true
  service:
    # NOTE THE ADDITION OF THIS LABEL WHICH STOPS CONSUL TRYING TO ADD THE EXTERNAL SERVICES TO THE SERVICE MESH.
    labels: {
      consul.hashicorp.com/service-ignore : "true"
    }
    type: NodePort
    nodePorts: ['30092', '30093', '30094']
    useHostIPs: true

You will also need to have a service account set up for kafka-headless and intentions for connecting systems. I created the following to help:

service-intentions.yaml

{{- range .Values.intentions }}
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
  name: {{ .name }}
spec:
  destination:
    name: {{ .destination }}
  sources:
    {{- range .sources}}
    - {{- range $key, $value := . }}
          {{ $key }}: {{ $value }}
        {{- end }}
    {{- end }}
---
{{- end }}

serviceaccount.yaml

{{- range .Values.serviceAccounts }}
# Service account for the Elasticsearch service (for ACL enforcement)
apiVersion: v1
kind: ServiceAccount
metadata:
  name: {{ .name }}
---
{{- end }}

with the following added to the values file:

intentions:
  - name: services-to-kafka
    destination: kafka
    sources:
      - name: test-service
        action: allow
      - name: prod-service
        action: allow

serviceAccounts:
  - name: kafka-headless

You also need to be careful to set up firewalls etc. wherever the cluster is deployed.

I now have this working across multiple federated clusters.

As an example, the connection from a spring boot application would look like the following:

          "spring":{
            "application": {
              "name": "test-service"
            },
            "kafka": {
              "bootstrap-servers": "localhost:9094",
              "producer": {
                "bootstrap-servers": "localhost:9094"
              },
              "listener": {
                "listenRequest": {
                  "topic": "simple.request.test-topic",
                  "enabled": "true"
                },
                "listenResponse": {
                  "topic": "simple.response.test-topic",
                  "enabled": "false"
                }
              }
            }
          },

The pod annotations for that deployment would look like:

      consul.hashicorp.com/connect-inject: "true"
      consul.hashicorp.com/transparent-proxy: "false"
      consul.hashicorp.com/service-tags: "spring-boot"
      consul.hashicorp.com/service-metrics-path: '/actuator/prometheus'
      consul.hashicorp.com/connect-service-upstreams: "kafka:9094:dc1"
      consul.hashicorp.com/connect-service: "test-service"

Hope this helps provide some pointers.