hashicorp / consul-helm

Helm chart to install Consul and other associated components.
Mozilla Public License 2.0
419 stars 386 forks source link

Allow Consul client daemonset connectivity via nodePort (instead of only hostPort) #997

Closed karras closed 3 years ago

karras commented 3 years ago

Feature Description

Currently the Consul deployment on K8s relies on Kubernetes hostPorts to provide connectivity to the Consul clients as well as ensuring each app pod talks to its "K8s worker-local" Consul client. The design decision is well described in the Consul docs below:

This causes various problems in organizations and environments where hostPort is not available, for example when running TKGI (previously known as PKS) < 1.11 or simply when security requirements do not allow you to use it.

Therefore we (@flavioaiello, @pree & @karras) investigated various scenarios how the Consul deployment could be adjusted to avoid the hostPort option. This would not only solve the problem but also improve security related requirements in certain environments.

We managed to successfully make the switch to nodePorts via 2-3 additional modifications:

Note: The adjustment of the service-node-port-range is only required due to the hardcoded ports in consul-k8s and consul-helm. We'll track this in separate PRs. See also: https://github.com/hashicorp/consul-k8s/issues/544

Most likely making nodePorts the default deployment option would be desirable but I guess it's not possible because it would mean that every organization would need to enable the K8s feature gates. These are still in an alpha state and also disabled by default in vanilla K8s.

Additional references:

Use Case(s)

Contributions

We anticipate to open a PR soon with a new consul-helm flag/variable to allow this kind of deployment option.

karras commented 3 years ago

Working nodePort example service, tested on K8s 1.18, 1.19 and 1.20:

apiVersion: v1
kind: Service
metadata:
  name: consul-client
  namespace: consul
spec:
  type: NodePort
  selector:
    component: client
  ports:
    - name: http 
      protocol: TCP
      port: 8501
      nodePort: 8501
      targetPort: 8501
    - name: grpc
      protocol: TCP
      port: 8502
      nodePort: 8502
      targetPort: 8502
  topologyKeys:
    - "kubernetes.io/hostname"
pree commented 3 years ago

Steps to get it running on Kubernetes v1.21.1 using TopologyAwareHints:

Working nodePort example service, tested on K8s (rancher) 1.21.1:

apiVersion: v1
kind: Service
metadata:
  name: consul-client
  namespace: consul
  annotations:
    service.kubernetes.io/topology-aware-hints: auto
spec:
  type: NodePort
  selector:
    component: client
  ports:
    - name: http 
      protocol: TCP
      port: 8501
      nodePort: 8501
      targetPort: 8501
    - name: grpc
      protocol: TCP
      port: 8502
      nodePort: 8502
      targetPort: 8502

Issues we encountered: We activated the FeatureGate also on kubelet & scheduler, this resulted in the EndpointSlices not getting any hints and routing not properly working. Removing the FeatureGate on those components fixed this issue.

When everything is working, you should get a similar output from kubectl -n consul get endpointslices consul-client-xxxxx -o yaml:

addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
  - 10.42.2.12
  conditions:
    ready: true
  hints:
    forZones:
    - name: node2
  nodeName: node02
  targetRef:
    kind: Pod
    name: consul-ffl7f
    namespace: consul
    resourceVersion: "26388"
    uid: 8982f868-09e7-4b7f-93da-a80cfd1d5563
  zone: node2
- addresses:
  - 10.42.0.8
  conditions:
    ready: true
  hints:
    forZones:
    - name: node1
  nodeName: node01
  targetRef:
    kind: Pod
    name: consul-dqr9z
    namespace: consul
    resourceVersion: "12690"
    uid: d7bf8662-3449-4129-a00c-b5daaafefbf9
  zone: node1
- addresses:
  - 10.42.1.12
  conditions:
    ready: true
  hints:
    forZones:
    - name: node3
  nodeName: node03
  targetRef:
    kind: Pod
    name: consul-w69wt
    namespace: consul
    resourceVersion: "12814"
    uid: 0e8585a1-153a-4be3-9a30-d91df71e1fae
  zone: node3
kind: EndpointSlice
metadata:
  creationTimestamp: "2021-06-22T09:21:33Z"
  generateName: consul-client-
  generation: 95
  labels:
    endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
    kubernetes.io/service-name: consul-client
  name: consul-client-8jwqn
  namespace: consul
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Service
    name: consul-client
    uid: 923fcf4e-9d21-40da-a331-9ea7a6bc63aa
  resourceVersion: "26396"
  uid: 6a1d9abd-0a0c-4c46-b984-4b0f89e8c092
ports:
- name: grpc
  port: 8502
  protocol: TCP
- name: http
  port: 8501
  protocol: TCP
david-yu commented 3 years ago

Hi @karras and @pree we are monitoring this issue and PR so just wanted to let you know. We were hoping to chat with you though about these changes in a meeting and I've asked the EMEA team to reach out to set up a call. Just wanted to acknowledge that we see this and see that you are making some progress here.

karras commented 3 years ago

Much appreciated! We're currently working on polishing the PRs by adding missing unit tests, etc.

david-yu commented 3 years ago

Hi @karras @pree

Thank you for your contribution. At a high level, we generally do want to remove our requirement for using a HostPort for Consul Kubernetes, and are looking to work with the Consul Core team to build an architecture that will no longer rely on a Consul client agent. This will likely take some time and is not something that we have targeted for our next major release.

At a glance, this seems like a good approach for accommodating Kubernetes environments that cannot support hostPort due to security concerns or because their CNI does allow it. As far as the security concern with host ports, exposing the agent API should be fine and within the threat model so long as TLS and ACLs are used.

Because of the large surface area of testing that needs to be in place due to this change, we are not prepared to accept this PR. This would introduce a large amount of testing for all of our features against different configurations introduced by this change. Based on the other work that we currently have within our Kubernetes roadmap, we don’t think we can currently accept the PRs to support this and provide the ongoing maintenance of such a feature if it were merged.

We understand that TKGI 1.11 does support host ports, and we currently recommend that for production scenarios you leverage host ports, since this is the most definitive way we can direct traffic from pods to agents that are local to that host. We will leave this branch here though to see if others find this PR useful, since we are not sure how applicable this to other folks not on TKGI. We do not intend to merge or support the PRs at this time, however we will provide some feedback next week as well on the PRs that were linked in this issue.

karras commented 3 years ago

Thanks for the review and additional inputs, much appreciated! As discussed it can't be merged. Thus I'm closing this and the other related PRs and issues to prevent cluttering and prepare for the upcoming repo consolidation into consul-k8s.