hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
669 stars 322 forks source link

k8s to consul sync on changes and not via fullSync #1027

Open ezraroi opened 2 years ago

ezraroi commented 2 years ago

Question

I am using the consul-k8s to sync services from k8s to consul. From the docs it is not clear when and how this sync is happening. After carefully reading the code, the comments say that changes should happen when changes are detected via the k8s controller but the code itself syncs services to consul only on fullSync. I would like to make this clear, is this a bug or the desired design?

CLI Commands (consul-k8s, consul-k8s-control-plane, helm)

/bin/sh -ec consul-k8s-control-plane sync-catalog \ -log-level=info \ -log-json=false \ -k8s-default-sync=false \ -to-k8s=false \ -consul-domain=consul \ -allow-k8s-namespace="*" \ -deny-k8s-namespace="kube-system" \ -deny-k8s-namespace="kube-public" \ -k8s-write-namespace=${NAMESPACE} \ -node-port-sync-type=ExternalFirst \ -consul-k8s-tag=runtime-k8s \ -consul-node-name=k8s-sync-eks-prd-euw1-dar-general \ -add-k8s-namespace-suffix \

Helm Configuration

# https://github.com/hashicorp/consul-k8s/blob/c79730ba4f0e25b8330950d7ab575e4e3cfc1ef0/hack/helm-reference-gen/fixtures/full-values.yaml
global:
  enabled: true
  logJOSN: true
  domain: consul
  image: "consul:${consul_version}"
  datacenter: ${consul_datacenter}
  imageK8S: hashicorp/consul-k8s-control-plane:${helm_version}
  enablePodSecurityPolicies: false
  # TODO: enable tls later on
  tls:
    enabled: false

server:
  enabled: false # Consul servers are running outside K8S

externalServers:
  enabled: true
  hosts: [${consul_servers_address}]

client:
  enabled: true
  join: [${consul_servers_address}]
  # TODO: enable dataDirectoryHostPath
  grpc: true
  nodeMeta:
    pod-name: $${HOSTNAME}
    host-ip: $${HOST_IP}
    eks-cluster: ${eks_cluster}
  resources:
    requests:
      memory: '8Gi'
      cpu: '2'
    limits:
      memory: '8Gi'
      cpu: '2'
  updateStrategy: |
    rollingUpdate:
      maxUnavailable: 5
    type: RollingUpdate
  # https://learn.hashicorp.com/tutorials/consul/kubernetes-deployment-guide#consul-clients
  extraConfig: |
    {"advertise_reconnect_timeout": "15m"}

syncCatalog:
  enabled: true
  default: false # Need to add annotation the service: 'consul.hashicorp.com/service-sync': 'true'
  toConsul: true
  toK8S: false
  k8sAllowNamespaces: ["*"]
  k8sDenyNamespaces: ["kube-system", "kube-public"]
  k8sTag: "runtime-k8s"
  consulNodeName: "k8s-sync-${eks_cluster}"
  priorityClassName: "system-node-critical"

connectInject:
  enabled: false

controller:
  enabled: false

meshGateway:
  enabled: false

ingressGateways:
  enabled: false

terminatingGateways:
  enabled: false

prometheus:
  enabled: false

tests:
  enabled: false

Current understanding and Expected behavior

Sounds like the sync should happen on every change the controller detects from k8s. In fact, changes are synced to consul only on fullSync.

Comments from the code:

// ConsulSyncPeriod is how often the syncer will attempt to
// reconcile the expected service states with the remote Consul server.
// Syncer is responsible for syncing a set of Consul catalog registrations.
// An external system manages the set of registrations and periodically
// updates the Syncer. The Syncer should keep the remote system in sync with
// the given set of registrations.
    // SyncPeriod is the interval between full catalog syncs. These will
    // re-register all services to prevent overwrites of data. This should
    // happen relatively infrequently and default to 30 seconds.
    //
    // ServicePollPeriod is the interval to look for invalid services to
    // deregister. One request will be made for each synced service in
    // Kubernetes.
    //
    // For both syncs, smaller more frequent and focused syncs may be
    // triggered by known drift or changes.

Environment details

consul-k8s version: 0.40.0 Kubernetes version: v1.21 Cloud Provider: AWS

t-eckert commented 2 years ago

Hi @ezraroi, thank you for this great question. It had me looking back through some of the original design documents for this project. You are right that the catalog sync runs on fullSync, periodically at 30 second intervals by default. This is by design to avoid overwhelming the cluster with requests.

I'm not sure from the comments you provided where it is suggested otherwise. Please let me know any references which were confusing so that I can clarify them.

ezraroi commented 2 years ago

@t-eckert you can see from here:

// For both syncs, smaller more frequent and focused syncs may be
// triggered by known drift or changes.

and

// SyncPeriod is the interval between full catalog syncs. These will
// re-register all services to prevent overwrites of data. This should
// happen relatively infrequently and default to 30 seconds.

Also if we do hook and listen for changes in k8s i think we should sync them to consul. 30 seconds interval is very slow in real use cases. I think there 2 different types of sync:

  1. reconcile - Full sync on low interval
  2. live updates - As they happen in k8s
ishustava commented 2 years ago

Hey @ezraroi

That is true and is something has been brought up before. There is a draft PR #228 and issues #315 and #319 that I believe are caused by this behavior/current implementation. Unfortunately, to make this work properly, it will likely require a rewrite of the sync catalog controller.

I'll keep this issue as an enhancement request.

myxyz commented 2 years ago

I really hope this problem can be solved. Thank you!