hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
667 stars 317 forks source link

helm:Consul helm 0.32.1 / consul 1.10.0 can't inject the service #640

Closed halbornteam closed 2 years ago

halbornteam commented 3 years ago

Overview of the Issue

We are on k8s 1.20 and provision the consul-connect for service mesh. We have deployed the sample application as static-client and static-server which works properly. However, all of our services can't be started for automatic injection.

It encounters the error

2021-07-16T07:31:52.194Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)" 2021-07-16T07:31:53.197Z [ERROR] Unable to get Agent services: error="Unexpected response code: 403 (ACL not found)" 2021-07-16T07:31:54.204Z [INFO] Unable to find registered services; retrying 2021-07-16T07:31:55.206Z [INFO] Unable to find registered services; retrying 2021-07-16T07:31:56.210Z [INFO] Unable to find registered services; retrying 2021-07-16T07:31:57.213Z [INFO] Unable to find registered services; retrying

We have ACL enabled, and tried to redeploy our service. The service name matches the service account name.

Reproduction Steps

helm -n test-consul install consul hashicorp/consul -f consu.yml

  1. When running helm install with the following values.yml:

    gossipEncryption:
    # secretName is the name of the Kubernetes secret that holds the gossip
    # encryption key. The secret must be in the same namespace that Consul is installed into.
    secretName: "consul-gossip-encryption-key"
    # secretKey is the key within the Kubernetes secret that holds the gossip
    # encryption key.
    secretKey: "key"
    tls:
    # If true, the Helm chart will enable TLS for Consul
    # servers and clients and all consul-k8s components, as well as generate certificate
    # authority (optional) and server and client certificates.
    enabled: true
    
    enableAutoEncrypt: true
    serverAdditionalDNSSANs:
      - '*.consul'
      - '*.cluster.local'
      - '*.svc.cluster.local'
      - 'localhost.*'
    
    serverAdditionalIPSANs:
      - '127.0.0.1'
    verify: false
    httpsOnly: true
    
    acls:
    manageSystemACLs: true
    connect: true
    connectInject:
    # True if you want to enable connect injection. Set to "-" to inherit from
    # global.enabled.
    enabled: true
    default: true 
  2. View error

    --->

Logs

kubectl -n bots logs -f pod_name consul-connect-inject-init

2021-07-16T07:44:32.721Z [INFO] Check to ensure a Kubernetes service has been created for this application. 2021-07-16T07:44:33.723Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:34.725Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:35.730Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:36.780Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:37.784Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:38.787Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:39.790Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:40.794Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:41.799Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:42.801Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:42.801Z [INFO] Check to ensure a Kubernetes service has been created for this application. 2021-07-16T07:44:43.812Z [INFO] Unable to find registered services; retrying 2021-07-16T07:44:43.812Z [ERROR] Timed out waiting for service registration: error="did not find correct number of services: 0"

--->

Expected behavior

It init sucessfully and register to consul.

Environment details

lkysow commented 3 years ago

Hi, do your deployments have a corresponding Kubernetes Service with the same name?

2021-07-16T07:44:42.801Z [INFO] Check to ensure a Kubernetes service has been created for this application.

lemonit-eric-mao commented 3 years ago

@halbornteam

Consult connect inject init sidecar log. Unable to find registered services


2021-07-16T06:40:02.489Z [INFO] Check to ensure a Kubernetes service has been created for this application.

2021-07-16T06:40:03.492Z [INFO] Unable to find registered services; retrying

2021-07-16T06:40:03.492Z [INFO] Unable to find registered services; retrying

2021-07-16T06:40:03.492Z [INFO] Unable to find registered services; retrying

2021-07-16T06:40:03.492Z [INFO] Unable to find registered services; retrying

##Reason 1: the cluster lacks proxydefaults.

##Effect after modification

2021-07-16T06:41:09.635Z [INFO] Registered service has been detected: service=static-client

2021-07-16T06:41:09.635Z [INFO] Registered service has been detected: service=static-client-sidecar-proxy

2021-07-16T06:41:09.635Z [INFO] Connect initialization completed

Successfully applied traffic redirection rules

================================================================================================================

##Reason 2: the application deployed to k8s must have service. If the test directly deploys pod without service, the following exception will be raised

2021-07-17T05:09:08.430Z [INFO] Check to ensure a Kubernetes service has been created for this application.

2021-07-17T05:09:09.478Z [INFO] Unable to find registered services; retrying

2021-07-17T05:09:10.481Z [INFO] Unable to find registered services; retrying

2021-07-17T05:09:11.482Z [INFO] Unable to find registered services; retrying

2021-07-17T05:09:12.486Z [INFO] Unable to find registered services; retrying

2021-07-17T05:09:13.489Z [INFO] Unable to find registered services; retrying

2021-07-17T05:09:14.491Z [INFO] Unable to find registered services; retrying

2021-07-17T05:09:15.494Z [INFO] Unable to find registered services; retrying

2021-07-17T05:09:16.496Z [INFO] Unable to find registered services; retrying

2021-07-17T05:09:17.499Z [INFO] Unable to find registered services; retrying

2021-07-17T05:09:18.503Z [INFO] Unable to find registered services; retrying
halbornteam commented 3 years ago

@lkysow @lemonit-eric-mao Thanks for the response. We have a k8s service with the same name. However, no idea why it can't be registered or found through the injection.

ishustava commented 3 years ago

Hey @halbornteam

Could you provide the yaml configuration of your services that fail to be injected so that we can reproduce it?

halbornteam commented 3 years ago

I finally figure out the issue due to how we created the k8s service. The difference is we add targetport, protocol, and name when we create the service. what's the difference here? Interesting.

The below works

apiVersion: v1
kind: Service
metadata:
  name: {{ include "disputer.fullname" . }}
  labels:
{{ include "disputer.labels" . | indent 4 }}
spec:
  type: {{ .Values.service.type }}
  ports:
    - port: {{ .Values.service.port }}
  selector:
    app.kubernetes.io/name: {{ include "disputer.name" . }}
    app.kubernetes.io/instance: {{ .Release.Name }}
    app: {{ include "disputer.name" . }}

While the previous doesn't work

apiVersion: v1
kind: Service
metadata:
  name: {{ include "disputer.fullname" . }}
  labels:
{{ include "disputer.labels" . | indent 4 }}
spec:
  type: {{ .Values.service.type }}
  ports:
    - port: {{ .Values.service.port }}
       targetPort: http
      protocol: TCP
      name: http
  selector:
    app.kubernetes.io/name: {{ include "disputer.name" . }}
    app.kubernetes.io/instance: {{ .Release.Name }}
    app: {{ include "disputer.name" . }}
ndhanushkodi commented 3 years ago

Hey @halbornteam, we tried to reproduce using a K8s service without targetport, protocol, and name, and a K8s service with those things. In both cases, the service successfully came up.

Are you able to try adding targetport, protocol, and name to the static-server deployment and see the same issue you do with your templated disputer service? If you don't see the same issues with static-server, can you provide the full service configuration that doesn't work for the disputer service (i.e the K8s Service, K8s Service Account, and K8s Deployment)? Thanks!

cc @sadjamz

whiskeysierra commented 3 years ago
- port: {{ .Values.service.port }}
   targetPort: http
  protocol: TCP

@halbornteam This looks like it's slightly incorrectly indented. Maybe that's the issue?

wallyhall commented 3 years ago

We're having the same symptom after a Helm upgrade to 1.10.

No pods which were previously running successfully with the sidecar now work - the pod inits all fail with the same error in the connect-inject-init container.

Have tried completely flattening and reinstalling Consul, to no avail. (Same error.)

lkysow commented 3 years ago

@wallyhall can you share the logs from the init containers, the logs from the connect inject deployment, and the output from kubectl describe service <name> please.

wallyhall commented 3 years ago

@wallyhall can you share the logs from the init containers, the logs from the connect inject deployment, and the output from kubectl describe service <name> please.

Sorry for the late update - I resolved the issue very late (UK local time) last night. I either overlooked, or the upgrade notes/change log information for Helm on k8s installations didn't make clear that every service in the mesh now requires (as of Consul Helm 0.32.0 - which we upgraded past to get Consul 1.10 ... lots of version numbers!) a Kubernetes Service specification.

Adding those to all our services fixed the issue. (Fortunately I noticed the, imho "very subtle", pale blue which has been added to the Consul/Helm documentation:

Note: As of consul-k8s v0.26.0 and Consul Helm v0.32.0, having a Kubernetes Service is required to run services on the Consul Service Mesh.

cite: https://www.consul.io/docs/k8s/connect#connecting-to-connect-enabled-services )

lkysow commented 3 years ago

Sorry to hear you didn't see that on the changelog. Where do you typically read the changelog?

david-yu commented 2 years ago

Closing. Sorry for your troubles, please file additional issues if there is additional followup needed.