hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.25k stars 4.41k forks source link

Service Router filter uses incorrect metadata when a service has instances routed via terminating gateway #20734

Open mr-miles opened 6 months ago

mr-miles commented 6 months ago

Overview of the Issue

I have a service with instances directly available within the service mesh and externally registered. We are using the transparent proxy so those externally registered instances are reachable via the terminating gateway. This arrangement is because we are migrating services into the service mesh and updating them at the same time.

I want to use the service resolver to create a subset containing the externally-registered services and a second subset with those that have been migrated into the mesh. I added metadata to the external registration and expected

      "Service.Meta.external == \"true\""

to pick them out but I consistently got no members in the envoy cluster of dependent services.

However I did get the right instances when (a random guess and I got lucky!) I used a filter expression of:

      "Service.Meta[\"k8s-service-name\"] == \"consul-terminating-gateway\""

but although it worked on the source service, the subset on the terminating gateway itself was completely empty.

Combining both did yield the right result but is quite clunky and not very obvious!

      "Service.Meta[\"k8s-service-name\"] == \"consul-terminating-gateway\" or Service.Meta.external == \"true\"

It appears that the endpoint metadata used by the filter is overwritten with the terminating gateway metadata, for external endpoints.

I believe the correct behaviour would be for the filter expression to use the metadata of the relevant service instance metadata itself and not the terminating gateway that the instance is exposed through.


Reproduction Steps

  1. Provision a consul cluster with transparent proxy mode enabled and a terminating gateway instance.
  2. Create a service A with one instance and intentions so that A can call service B. Give service A metadata kv "my-meta"="my-meta"
  3. Create a service B with one instance.
  4. Call the catalog api to register an external node
  5. Call the catalog api to register an instance of service B against the node from step 4.
  6. Check that the B shows in the terminating gateway and that the topology shows A linking to B
  7. Create a service resolver:
kind: ServiceResolver
metadata:
  name: service-A
spec:
  defaultSubset: external
  subsets:
    external:
      filter: "Service.Meta[\"k8s-service-name\"] == \"consul-terminating-gateway\""
      onlyPassing: true
    internal:
      filter: "Service.Meta[\"my-meta\"] == \"my-meta\""
      onlyPassing: true
  1. Connect to A to see the envoy cluster config. Observe that the enoy cluster for B.external has 2 members

Consul info for both Client and Server

EKS 1.28 Consul 1.17.3 Installed via helm chart Transparent proxy enabled Connect enabled

vijayraghav-io commented 6 months ago

Just to add, Currently a known limitation - https://developer.hashicorp.com/consul/docs/connect/gateways/terminating-gateway image

mr-miles commented 6 months ago

Thanks @vijayraghav-io - I had read that as saying you couldn't filter the service instances hosted via a tgw, which is different. Also, I think it would aid developers if this limitation were also mentioned on the service-router / service-resolver pages since it is easy not to be thinking about this as a terminating gateway issue and so never come across the limitation.

Regardless, it seems like it almost works - is there any interest in a PR around it? If so, any pointers on the implementation?

jm96441n commented 5 months ago

hey @mr-miles sso this isn't particularly something we can do with terminating gateways, when you specify metadata for a service instance in the service mesh the metadata is copied on to the proxy sidecar instance for the service resolver lookup, when you do this with a terminating gateway it is essentially a single proxy for multiple service instances and there are a few issues that come up with that:

  1. key collision on metadata keys (if two services declare the same key with different values then one of them will be overwritten)
  2. because metadata is an instance level attribute and the terminating gateway serves as a load balancer across all instances of the service that it is fronting there would need to be a metadata field on the terminating gateway itself that would be per entry of the service in the services list, which isn't a change we're looking to make at this time

The fix for this can be in the documentation for service resolvers to add the note that if any instances are fronted by a terminating gateway that the filtering will be on the generated terminating gateway catalog entry and not the ultimate non-mesh service instance