hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
670 stars 324 forks source link

Consul connect inject and ElasticSearch in Kubernetes exit code 52 #1109

Closed codex70 closed 2 years ago

codex70 commented 2 years ago

Community Note


Overview of the Issue

When trying to set up ElasticSearch using Operator with Consul connect inject, it is not possible to connect to the elastic search pod and Kibana fails to start properly. Also ElasticSearch cannot connect to its own internal server for license requests.

All requests end in an exit code 52.

If I do not install Consul on the cluster, ElasticSearch starts and works as expected.

With Consul installed I get an empty response from the the endpoint however I try to access the service. If I go directly into the elastic search pod "elastic-search-es-default-0" and curl the endpoint using localhost, or the pod's IP address, it works. If I try to curl the service's endpoint, or IP address it doesn't work.

If I go directly into the operator pod "elastic-operator-0", or kibana pod nothing works, including using the "elastic-search-es-default-0" pod's direct IP address.

This is the current setup:

NAME                               READY   STATUS    RESTARTS   AGE
pod/elastic-operator-0             2/2     Running   2          106m
pod/elastic-search-es-default-0    2/2     Running   0          84m

NAME                                      TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)          AGE
service/elastic-operator-webhook          ClusterIP      10.14.48.161    <none>            443/TCP          106m
service/elastic-search-es-default         ClusterIP      None            <none>            9200/TCP         84m
service/elastic-search-es-http            ClusterIP      10.14.54.15     <none>        9200:30004/TCP   84m
service/elastic-search-es-internal-http   ClusterIP      10.14.89.29     <none>            9200/TCP         84m
service/elastic-search-es-transport       ClusterIP      None            <none>            9300/TCP         84m
service/kibana-kb-http                    ClusterIP      10.14.120.175   <none>            5601/TCP         84m

From inside the "elastic-search-es-default-0" pod:

curl -k -u elastic:PASSWORD http://10.13.1.27:9200/
{
  "name" : "elastic-search-es-default-0",
  "cluster_name" : "elastic-search",
  "cluster_uuid" : "VnxncScAQ76Dn-yNCqqkVA",
  "version" : {
    "number" : "8.1.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "3700f7679f7d95e36da0b43762189bab189bc53a",
    "build_date" : "2022-03-03T14:20:00.690422633Z",
    "build_snapshot" : false,
    "lucene_version" : "9.0.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

curl -k -u elastic:PASSWORD http://10.14.54.15:9200/
curl: (52) Empty reply from server

All other request from anywhere else respond with curl: (52) Empty reply from server

Reproduction Steps

This is on a completely clean kubernetes install, loading as close to vanilla configurations for Consul and ElasticSearch as possible. ElasticSearch is installed using ElasticOperator as per the instructions. I am using Helm templates which are configured as follows:

Consul config:

consul:
  global:
    name: consul
    datacenter: dc1
    image: hashicorp/consul:1.11.2
    imageEnvoy: envoyproxy/envoy:v1.20.1
    imageK8S: hashicorp/consul-k8s-control-plane:0.39.0
    metrics:
      enabled: true
      enableAgentMetrics: true
  server:
    replicas: 1
  ui:
    enabled: true
  connectInject:
    enabled: true
    default: true
  controller:
    enabled: true
  prometheus:
    enabled: false

Elastic:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elastic-search
spec:
  version: {{ .Values.elastic.version }}
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
    - name: default
      count: 1
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
            storageClassName: oci-bv
      config:
        node.store.allow_mmap: false
      podTemplate:
        metadata:
          annotations:
            consul.hashicorp.com/connect-service: "elastic-search-es-http"
            consul.hashicorp.com/connect-inject: "true"
            consul.hashicorp.com/connect-service-port: "http"
        spec:
          automountServiceAccountToken: true
          serviceAccount: elastic-search-es-http

Kibana:

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana

spec:
  version: {{ .Values.elastic.version }}
  count: 2
  elasticsearchRef:
    name: elastic-search
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  podTemplate:
    metadata:
      annotations:
        consul.hashicorp.com/connect-service: "kibana"
        consul.hashicorp.com/connect-inject: "true"
        consul.hashicorp.com/connect-service-port: "http"
        consul.hashicorp.com/connect-service-upstreams: "elastic-search-es-http:9200"
    spec:
      automountServiceAccountToken: true
      serviceAccount: kibana

Note: I've tried this with both connectInject:default set to true and false and the result is the same.

I'm also using the minimum security I can to get this working, with the idea being that I tighten up security rules once this is working.

Logs

Expected behavior

Systems install and work without connection issues.

Environment details

This cluster is in a cloud environment, but a completely clean new cluster. Nginx is used for ingress control to front ends, although this could be changed if there is a better solution with Consul.

Additional Context

david-yu commented 2 years ago

Hi @codex70 this sounds like a duplicate issue based on the findings from https://github.com/hashicorp/consul-k8s/issues/1137 towards the end. I'll close this issue as the PR linked that issue should address this issue. Let us know if that is not the case!

codex70 commented 2 years ago

Hi @david-yu , yes I think the issue was different, but the fix probably resolves both issues. Thanks as always for your help.