hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
669 stars 323 forks source link

Documentation - Transparent proxy + terminating gateway #1486

Open mr-miles opened 2 years ago

mr-miles commented 2 years ago

Community Note


Overview of the Issue

Thanks for all the great work! We are trying to get the transparent proxy going with mesh traffic only, and to get it to route successfully via a terminating gateway. In the end we got it working but there are a few things that are undocumented or maybe even bugs, so I am noting them here for documentation updates.

Reproduction Steps

  1. Terminating gateway will not work with transparent proxy unless dns is enabled in the helm chart. This is because otherwise dns entries like .virtual.consul do not resolve (where is exposed on the terminating gateway), and this is needed for the envoy traffic filters to route properly. AFAICS this is specific to terminating gateway services since these services do not have corresponding k8s resources / cluster IPs.

  2. If you add a pod annotation specifying the service downstreams, then no transparent proxy envoy filters are created for those services. This is particularly confusing if you have specified ALL the services this way, since in this case the outbound listener on 15001 isn't created at all! I expected the transparent proxy setup to make the annotations redundant, rather than having to remove the same-datacenter ones.

Environment details

david-yu commented 2 years ago

Hi @mr-miles Could you provide details on how you are configuring the terminating Gateway via CRD? Btw with the new TProxy enhancements in Terminating Gateway no longer require you to register the service you are trying to dial with Consul and you would need to leverage some new configuration in ServiceDefaults instead: https://www.consul.io/docs/connect/config-entries/service-defaults#terminating-gateway-destination. I realize the docs are a little confusing and we probably need to update related Terminating GW docs elsewhere to show when to use this new config.

mr-miles commented 2 years ago

Thanks @david-yu for the quick response - I was not aware of that service-defaults usage. And yes, currently I'm registering the destinations as services.

So do the destination addresses have any interaction with the SNI setting on the terminating-gateway resource? And lastly, will the transparent proxy pick up calls to the given addresses (so my container continues to use "example.com" and it is all routed magically) or is a specific hostname needed (like example-https.virtual.consul)?

Keen to give it a go but currently stuck on 1.11.x because of https://github.com/hashicorp/consul/issues/14514 and the grpc tls issue (which I'm hoping is fixed in 1.13.2 soon)... (for clarity we were using 1.13 in our investigation up, but then found the certificate problems and had to pause/downgrade)

mr-miles commented 2 years ago

@david-yu - now on 1.13 and trying to use those TProxy enhancements to terminating gateway to set up a destination. Is there a full working example anywhere to look at? So far I have:

But the virtual IP of the destination is not appearing in my envoy config. Stumped about what else to try...

krzysztof-bronk commented 1 year ago

Hello,

I'm not sure if it's the right issue to add to, but I've tried following the official documentation https://developer.hashicorp.com/consul/docs/k8s/connect/terminating-gateways (almost) exactly and I can't quite get the Terminating Gateway to work, at least on 1.0.1 version of the chart (utilising consul 1.14.1), which is pretty recent.

Whether transparent proxy is enabled or not, and whether I use service-defaults or explicit external service registration, the traffic does not seem to go through Terminating Gateway at all. Note that on this same setup service-to-service and Ingress Gateway communication work fine.

I'm specifically testing the example.com external site as in the tutorial.

Screenshot 2022-12-05 at 15 43 56 Screenshot 2022-12-05 at 15 44 01 Screenshot 2022-12-05 at 15 44 10 Screenshot 2022-12-05 at 15 44 33 Screenshot 2022-12-05 at 15 44 38 Screenshot 2022-12-05 at 15 48 02

In both cases (connecting via sidecar localhost:1234 or using TP http://example.com) I do not get a valid or expected response.

Has anyone managed to follow that example and also double-checked that if you add an explicit deny intention, or... scale terminating gateway to 0, you get a different result?

One possible difference: I'm installing and configuring everything in k8s "consul" namespace, not default, however afaik in Consul OSS this is ignored (in terms of functionality).

Some random observations:

     "version_info": "fa3226ce5123ba17b51452811f0d915451d59aade03f0662991f2fa260b787cc",
     "cluster": {
      "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
      "name": "example-https.default.dc1.internal.18ee9227-3131-c771-e3c8-1e3c19b60d3f.consul",
      "type": "EDS",
      "eds_cluster_config": {
       "eds_config": {
        "ads": {},
        "resource_api_version": "V3"
       }
      },
      "connect_timeout": "5s",
      "circuit_breakers": {},
      "outlier_detection": {},
      "transport_socket": {
       "name": "tls",
       "typed_config": {
        "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
        "common_tls_context": {
         "tls_params": {},
         "tls_certificates": [
          {
           "certificate_chain": {
            "inline_string": "[redacted]"
           },
           "private_key": {
            "inline_string": "[redacted]"
           }
          }
         ],
         "validation_context": {
          "trusted_ca": {
           "inline_string": "[redacted]"
          },
          "match_subject_alt_names": [
           {
            "exact": "spiffe://18ee9227-3131-c771-e3c8-1e3c19b60d3f.consul/ns/default/dc/dc1/svc/example-https"
           }
          ]
         }
        },
        "sni": "example-https.default.dc1.internal.18ee9227-3131-c771-e3c8-1e3c19b60d3f.consul"
       }
      },
      "common_lb_config": {
       "healthy_panic_threshold": {}
      },
      "alt_stat_name": "example-https.default.dc1.internal.18ee9227-3131-c771-e3c8-1e3c19b60d3f.consul"
     },
     "last_updated": "2022-12-05T14:58:58.447Z"
    }

and a filter:

        {
         "filter_chain_match": {
          "prefix_ranges": [
           {
            "address_prefix": "240.0.0.11",
            "prefix_len": 32
           }
          ]
         },
         "filters": [
          {
           "name": "envoy.filters.network.tcp_proxy",
           "typed_config": {
            "@type": "type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy",
            "stat_prefix": "upstream.example-https.default.default.dc1",
            "cluster": "example-https.default.dc1.internal.18ee9227-3131-c771-e3c8-1e3c19b60d3f.consul"
           }
          }
         ]
        }

I don't quite see the address of the Terminating Gateway anywhere. Should I?

blake commented 1 year ago

when the intention is an explicit deny, envoy ends up with no setup for example-https. But you can still connect to example.com without issues (just like you can connect to any other external URL, as long as you have internet connectivity, transparent proxy does not seem to prevent that)

@krzysztof-bronk In order to block access to destinations that are not registered in Consul, you need to set TransparentProxy.MeshDestinationsOnly equal to true in the Mesh config entry.

The address of the terminating gateway should be visible in localhost:19000/clusters or when querying the config dump endpoint with the include_eds query option. E.g., curl localhost:19000/config_dump\?include_eds

If you're using the service-defaults Destination feature, you should not be registering a dummy service into Consul's catalog. You just need to create the config entry with access to the intended destination, and then access the destination by the hostname(s) listed in the addresses array. For example, https://developer.hashicorp.com/consul/docs/connect/config-entries/service-defaults#terminating-gateway-destination.

Do you mind sharing the relevant configuration from the terminating gateway so that we can see if it is correctly being configured to route to this external destination?

inisitijitty commented 1 year ago

If you're using the service-defaults Destination feature, you should not be registering a dummy service into Consul's catalog. You just need to create the config entry with access to the intended destination, and then access the destination by the hostname(s) listed in the addresses array. For example, https://developer.hashicorp.com/consul/docs/connect/config-entries/service-defaults#terminating-gateway-destination.

@blake So it's not possible to use the service-defaults name to reach the destination ? Having this possibility would be great.

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: test-destination
spec:
  destination:
    addresses:
      - "test.com"
      - "test.org"
    port: 443

What I mean is using test-destination and not test.com or test.org. That would abstract/hide the real destination(s) and avoid making changes in the consumers if the destination address changes.

blake commented 1 year ago

So it's not possible to use the service-defaults name to reach the destination ?

@inisitijitty You can use the service name if you explicitly register the service into Consul's catalog. Instructions for doing this are documented on https://developer.hashicorp.com/consul/docs/k8s/connect/terminating-gateways#register-external-services-with-consul under the tab "Using Consul catalog." Applications in the mesh can then connect to the external service with tproxy using the Consul VIP DNS name (e.g., test-destination.virtual.consul) instead of needing to use an explicitly defined upstream.

See https://github.com/hashicorp/consul/issues/12116#issuecomment-1016166569 for more info on this.