hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
669 stars 322 forks source link

Connections bypass ACL security in multi-port #1606

Open darkn3rd opened 2 years ago

darkn3rd commented 2 years ago

Community Note

Overview of the Issue

When using multi-port service with ACLs + TLS verify, traffic between upstream via localhost to the server is protected, but traffic from outside to the service endpoint is not protected. I also tried from container outside of the service mesh completely, and I am able to connect to service endpoint. Essentially, security mechanism is bypassed completely when using multi-port, and it seems only the explicit upstream access through localhost is secured. This defeats the purpose of using the service mesh in the first place.

If there is any solution or hack to ameliorate this into this behavior is fixed, that would be great.

Reproduction Steps

I followed guides related from:

  1. Deploy consul with these values:
    global:
     name: consul
     enabled: true
     datacenter: dc1
     gossipEncryption:
       autoGenerate: true
     tls:
       enabled: true
       enableAutoEncrypt: true
       verify: true
     acls:
       manageSystemACLs: true
    server:
     replicas: 1
     securityContext:
       runAsNonRoot: false
       runAsUser: 0
    connectInject:
     enabled: true
    controller:
     enabled: true
  2. Deploy server:
    # server.yaml
    ---
    apiVersion: consul.hashicorp.com/v1alpha1
    kind: ServiceDefaults
    metadata:
     name: static-server
    spec:
     protocol: 'http'
    ---
    apiVersion: v1
    kind: Service
    metadata:
     name: web
    spec:
     selector:
       app: web
     ports:
       - protocol: TCP
         port: 80
         targetPort: 8080
    ---
    apiVersion: v1
    kind: Service
    metadata:
     name: web-admin
    spec:
     selector:
       app: web
     ports:
       - protocol: TCP
         port: 80
         targetPort: 9090
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
     name: web
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
     name: web-admin
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
     name: web
    spec:
     replicas: 1
     selector:
       matchLabels:
         app: web
     template:
       metadata:
         name: web
         labels:
           app: web
         annotations:
           'consul.hashicorp.com/connect-inject': 'true'
           'consul.hashicorp.com/transparent-proxy': 'false'
           'consul.hashicorp.com/connect-service': 'web,web-admin'
           'consul.hashicorp.com/connect-service-port': '8080,9090'
       spec:
         containers:
           - name: web
             image: hashicorp/http-echo:latest
             args:
               - -text="hello world"
               - -listen=:8080
             ports:
               - containerPort: 8080
                 name: http
           - name: web-admin
             image: hashicorp/http-echo:latest
             args:
               - -text="hello world from 9090"
               - -listen=:9090
             ports:
               - containerPort: 9090
                 name: http
         serviceAccountName: web
  3. Deploy client:
    # client.yaml
    ---
    apiVersion: consul.hashicorp.com/v1alpha1
    kind: ServiceDefaults
    metadata:
     name: static-client
    spec:
     protocol: 'http'
    ---
    apiVersion: v1
    kind: Service
    metadata:
     # This name will be the service name in Consul.
     name: static-client
    spec:
     selector:
       app: static-client
     ports:
       - port: 80
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
     name: static-client
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
     name: static-client
    spec:
     replicas: 1
     selector:
       matchLabels:
         app: static-client
     template:
       metadata:
         name: static-client
         labels:
           app: static-client
         annotations:
           'consul.hashicorp.com/connect-inject': 'true'
           consul.hashicorp.com/connect-service-upstreams: "web:1234,web-admin:2234"
       spec:
         containers:
           - name: static-client
             image: curlimages/curl:latest
             # Just spin & wait forever, we'll use `kubectl exec` to demo
             command: ['/bin/sh', '-c', '--']
             args: ['while true; do sleep 30; done;']
         # If ACLs are enabled, the serviceAccountName must match the Consul service name.
         serviceAccountName: static-client
  4. Exec into client container:
    export NS=${NS:-"default"}
    POD=$(kubectl get pods --namespace $NS --selector app=static-client --output name)
    kubectl exec -ti --container "static-client" --namespace $NS ${POD} -- /bin/sh
  5. Connect through upstream (run from within client container):
    curl localhost:1234 # fails as expected
    # curl: (7) Failed to connect to localhost port 1234 after 0 ms: Connection refused
    curl localhost:2234 # fails as expected
    # curl: (7) Failed to connect to localhost port 2234 after 0 ms: Connection refused
  6. Connect through service endpoint:
    export NS=${NS:-"default"}
    curl web.$NS.svc.cluster.local # this SHOULD FAIL
    # "hello world"
    curl web-admin.$NS.svc.cluster.local # this SHOULD FAIL
    # "hello world from 9090"

Logs

n/a

Expected behavior

I expected that when communicating through the service endpoints, e.g. web.default.svc.cluster.local, this would be blocked. However, this works fine despite ACLs enabled.

Environment details

Additionally, please provide details regarding the Kubernetes Infrastructure, as shown below:

Additional Context

This works fine in single-port scenario.

ishustava commented 2 years ago

Hey @darkn3rd

This is a known limitation with the multi-port workaround. Specifically that transparent proxy is not supported (https://developer.hashicorp.com/consul/docs/k8s/connect#caveats-for-multi-port-pods). Transparent proxy is the feature that ensures that all traffic goes to the proxy so that you don't bypass the service mesh.

One way to ensure this without transparent proxy is to have your services only bind to localhost instead of 0.0.0.0 or the pod IP.

darkn3rd commented 2 years ago

How then can an ingress controller get integrated to Consul, if the service only listens on localhost? Are there solutions where the ingress could communicate through the service mesh?

ishustava commented 2 years ago

Yeah, we have some docs on that here: https://developer.hashicorp.com/consul/docs/k8s/connect/ingress-controllers. These docs are for when you are using transparent proxy, but in the case when tproxy is disabled, you just wouldn't need that additional configuration. We haven't tested it with multi-port though.

darkn3rd commented 2 years ago

Conceptually, unless I am misunderstanding how this would work, I am not sure how it can work without transparent proxy, because the ingress automatically uses the internal service endpoint, which will not be secured. Somehow, there would need some advanced hacks to route traffic to localhost, instead of the normal service path, e.g. dgraph-alpha.dgraph.svc.cluster.local.

ishustava commented 2 years ago

Yeah you're right! Ingress controllers would not work without tproxy. Sorry I was wrong in my above response.

In that case, a Consul API gateway would be a better choice for ingress instead of the ingress controllers because it can route to consul services.

darkn3rd commented 2 years ago

@ishustava I looked at Consul API gateway as a possibility, but I have to pass for the moment because (1) it does not support gRPC, and (2) the current documentation is in Terraform/Kustomize and requires installing EKS; it would take some time to distill the material and extract I need.

It would be nice if someone tested multi-port on ingress controller that may provide some features where sending to traffic to localhost can be configured. The ingress-nginx may do this, but I have never had the need to go there. This certainly adds an extreme level of complexity.

The root cause of all these issues, is that out of the box, Consul Service Mesh should support K8S API, that is list of ports per service, as this is not an uncommon use case. ACLs is pretty much pointless without transparent proxy, SM can be bypass completely. :'(

mikemorris commented 2 years ago

the current documentation is in Terraform/Kustomize and requires installing EKS

If you select the "Local" tab instead of "HashiCorp Cloud Platform (HCP)" in https://learn.hashicorp.com/tutorials/consul/kubernetes-api-gateway, the instructions are written for Kind but should be applicable to any generic Kubernetes environment - no Terraform or EKS needed, and the built-in kubectl apply --kustomize is only used for initially installing the CRDs.

Sorry to hear Consul API Gateway doesn't meet your needs at the moment, but hope you'll consider it in the future as we continue development!

darkn3rd commented 2 years ago

Thank you, the local that's lot easier to synthesize and derive a solution, I will try this some time in the future when I get a chance.

Back to the original issue, it would be nice to have full functional support of multi-port, which would be parity with the Kubernetes service API that supports a list of ports, where Consul service registration, if I understand this correctly, only supports one port. There shouldn't have to be, for example, 4 envoy proxy sidecar containers for 4 ports, but rather a single envoy sidecar supporting four ports. The security features provided with transparent proxy should be afforded to multi-port configurations, where bypassing the service mesh for strict mTLS shouldn't even be a possibility, or if ACLs are enabled, it shouldn't be bypassed. Shifting left the to the application service itself (e.g. access or allow list), or requiring other non-Consul solutions as a way to ameliorate the lack of functionality (such as firewalls or network policy), especially in security, shouldn't be an acceptable baseline.

On top of this, the current multi-port, besides disabling a lot of core functionality mentioned above, and other things like metrics, adds a layers of complexity, already complex solution. Additionally in exchange for more complexity, lack of core functionality (such as observability), security vulnerability, more side-car proxy containers increases the footprint to use the service mesh.

Many of core Consul features, essential for cloud native solutions, are baked into Kubernetes itself, so outside the service mesh, the value prop is low for Consul itself; Consul service mesh with missing metrics and security and other features for multi-port, makes the solution non-competitive, at least in the scope of multi-port. This is a shame, given many of the more advanced features available now and planned won't be realized if it doesn't afford a basic functionality with multi-port.

So in conclusion, I think I would then make this a feature request that all traffic could have strict mTLS enforced and ACLs applied, so that only desired permitted traffic gets to the application service. I would like to see this prioritized on the roadmap. In the interim, more documentation around ACLs with multi-port would be desirable, as well as noting these limitations early on in the journey, reiterated in overview and Getting Started types of docs. Additionally, in the interim, testing solutions and further documentation on such integrations internal and external would be nice; this would require collaboration with other OSS projects. There's not a lot of material in this area w/ multiport; I don't know if others have gotten to a base level of success with multi-port to further explore such areas yet.

I hope this is helpful. Thanks for the advice and solutions for the workarounds around the current limitation.

ishustava commented 2 years ago

Thanks so much for this feedback @darkn3rd !!

This is definitely something we're looking to improve. I don't have any specifics yet, but we're looking into having better support for multi-port in the near future, so I hope we'll have something that works better soon!

darkn3rd commented 1 year ago

I look forward to the roadmap and/or any docs on this, especially as this makes Consul a non-starter. I hope it can be addressed soon, rather than supporting more advance features, when basic core features supported in K8S API (e.g. list of ports with service) is not supported.