hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
668 stars 321 forks source link

MeshGateway config required despite it being disabled #1022

Open narendrapatel opened 2 years ago

narendrapatel commented 2 years ago

We have a VM based Consul cluster and a Kubernetes based Consul cluster in federation with VM cluster being the acl master and service mesh enabled. However, what we observed is that we still need to provide Mesh Gateway configurations despite it being disabled on the helm chart config.

Application pods in the mesh would not pass the init stage. Get the following error in pod logs:

2022-02-04T07:58:57.878Z [INFO]  Check to ensure a Kubernetes service has been created for this application. If your pod is not starting also check the connect-inject deployment logs.
2022-02-04T07:58:58.880Z [INFO]  Unable to find registered services; retrying

Get the below in connect-inject:

{"level":"error","ts":1643961512.3430398,"logger":"controller.endpoints","msg":"failed to create service registrations for endpoints","name":"web","ns":"consul","error":"upstream \"api:8000:dc1\" is invalid: ProxyDefaults mesh gateway mode is neither \"local\" nor \"remote\"","stacktrace":"[[sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile](http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile)](http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile](http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile))\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:114\[[nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler)](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler))\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:311\[[nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem)](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem))\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:266\[[nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2)](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2))\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227"}
{"level":"error","ts":1643961512.343221,"logger":"controller.endpoints","msg":"failed to register services or health check","name":"web","ns":"consul","error":"upstream \"api:8000:dc1\" is invalid: ProxyDefaults mesh gateway mode is neither \"local\" nor \"remote\"","stacktrace":"[[sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler](http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler)](http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler](http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler))\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:311\[[nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem)](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem))\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:266\[[nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2)](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2))\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227"}

As a solution need to set the MeshGateway mode in ProxyDefaults to local / remote(none does not work) for the pods to start up correctly. However post starting up envoy does not receive any upstream cluster endpoints. Have to again set the MeshGateway mode in ServiceDefaults to none for each service for their envoy sidecars to load up upstream cluster endpoints.

Consul Version

1.11.1

t-eckert commented 2 years ago

Hi @narendrapatel, thank you for reaching out. I will try to replicate this and see where the disconnect is in Consul. Would you be so kind as to share the full Helm config you used? That will help me replicate the issue.

narendrapatel commented 2 years ago

Hi @t-eckert,

Thanks for the revert :)

Please find the below config used for helm.

  global:
    image: "hashicorp/consul:1.11.1"
    datacenter: dc2
    federation:
      enabled: false
    imageEnvoy: "envoyproxy/envoy-alpine:v1.18.2"
  server:
    replicas: 1
    securityContext:
      runAsNonRoot: false
      runAsGroup: 0
      runAsUser: 0
      fsGroup: 0
    extraConfig: |
      {
        "primary_datacenter": "dc1",
        "retry_join_wan": ["10.29.149.94"]
      }
  meshGateway:
    enabled: false
  client:
    tolerations: |
      - key: "cloud.google.com/gke-preemptible"
        operator: Equal
        value: "true"
        effect: NoSchedule

Chart version used : version: 0.39.0

lkysow commented 2 years ago

This is because a) we have an assumption that kube clusters are all federated using mesh federation and b) we wanted to warn users that if they're using another dc upstream and they don't have mesh gateway mode set that nothing will work

Maybe a simple solution is a new value that can silence this error?

connectInject:
  validateRemoteDCUpstreams: true

Defaults to true but can be set to false?

david-yu commented 2 years ago

Hi @narendrapatel hopefully we answered your question. I'll go ahead and close this issue, let us know if you have any follow up from our previous response!

narendrapatel commented 2 years ago

Hi @david-yu, I guess the above was a suggestion from @lkysow to be implemented in Consul k8s. Don't think we have this config as of now. Thanks.

david-yu commented 2 years ago

Ok thanks let me re-open for tracking.