Auth: Create Envoy clusters from ExtensionServices

jpeach commented 4 years ago

Please describe the problem you have

Refactor the services code so that it can be substantially shared between Routes and ExtensionServices. Add DDAG support for generating Envoy clusters that from ExtensionServices.

xref #432 xref #2643

jpeach commented 4 years ago

The ExtensionService type has a slice of Contour Services (not the same as Kubernetes Services) so that an operator can weight requests between multiple Kubernetes Services, giving them more flexibility to split traffic between Kuberenetes Services that implement the same GRPC endpoints. This is the same flexibility that HTTPProxy provides on for routes.

Internally, the Contour EndpointsTranslator implements a one-to-one mapping between Kubernetes Service endpoints and Envoy ClusterLoadAssignments (i.e. endpoint discovery service). The EndpointsTranslator operates independently of Contour ingress reconciliation. Ingresses (including HTTPProxies) consume the output of the EndpointsTranslator because internally, Contour generates a stable and predictable ClusterLoadAssignment from a Kubernetes Service name.

Note well that this means that the ClusterLoadAssignment generation is completely independent of the rest of the ingress reconciliation. Everything works in the end because the ClusterLoadAssignment names are consistent and globally unique, but there is no explicit coordination here.

A HTTPProxy Service generates a one or more DAG Services. A DAG Service generates both and Envoy Cluster (for the cluster discovery service) and a RouteAction (for the route configuration discovery service). Since the DAG Service represents Envoy configuration at multiple configuration layers, some fields are valid at only one of the layers.

So, to summarize:

A Contour Route balances across multiple Contour Services (at the HTTP layer)
A DAG Service maps to one Kubernetes Service
A DAG Service generates one Envoy Cluster
Each Envoy Cluster references one EDS ClusterLoadAssignment
Each ClusterLoadAssignment contains endpoints from exactly one Kubernetes Service

The Envoy ext_authz filter configuration accepts a Envoy Cluster name as the target for its requests. We have to program the Envoy Cluster for ext_authz filter from the ExtensionService API (since that's why it was designed). We don't have a HTTP Connection Manager, so there is no involvement from the route configuration discovery service. This means that to balance requests over multiple Kubernetes Services, we have to do so within an Envoy Cluster. This is possible, but it doesn't fit naturally within the Contour Service API, so we need to reconfigure the ExtensionService API.

diff --git apis/projectcontour/v1alpha1/extensionservice.go apis/projectcontour/v1alpha1/extensionservice.go
index 9b41543b..39249494 100644
--- apis/projectcontour/v1alpha1/extensionservice.go
+++ apis/projectcontour/v1alpha1/extensionservice.go
@@ -26,6 +26,31 @@ type ExtensionProtocolVersion string
 // SupportProtocolVersion2 requests the "v2" support protocol version.
 const SupportProtocolVersion2 ExtensionProtocolVersion = "v2"

+// ExtensionServiceTarget defines an Kubernetes Service to target with
+// extension service traffic.
+type ExtensionServiceTarget struct {
+       // Name is the name of Kubernetes service that will accept service
+       // traffic.
+       //
+       // +required
+       Name string `json:"name"`
+
+       // Port (defined as Integer) to proxy traffic to since a service can have multiple defined.
+       //
+       // +required
+       // +kubebuilder:validation:Minimum=1
+       // +kubebuilder:validation:Maximum=65536
+       // +kubebuilder:validation:ExclusiveMinimum=false
+       // +kubebuilder:validation:ExclusiveMaximum=true
+       Port int `json:"port"`
+
+       // Weight defines proportion of traffic to balance to the Kubernetes Service.
+       //
+       // +optional
+       // +kubebuilder:validation:Minimum=0
+       Weight int64 `json:"weight,omitempty"`
+}
+
 // ExtensionServiceSpec defines the desired state of an ExtensionService resource.
 type ExtensionServiceSpec struct {
        // Services specifies the set of Kubernetes Service resources that
@@ -33,7 +58,17 @@ type ExtensionServiceSpec struct {
        //
        // +required
        // +kubebuilder:validation:MinItems=1
-       Services []contourv1.Service `json:"services"`
+       Services []ExtensionServiceTarget `json:"services"`
+
+       // UpstreamValidation defines how to verify the backend service's certificate
+       // +optional
+       UpstreamValidation *contourv1.UpstreamValidation `json:"validation,omitempty"`
+
+       // Protocol may be used to specify (or override) the protocol used to reach this Service.
+       // Values may be tls, h2, h2c. If omitted, protocol-selection falls back on Service annotations.
+       // +kubebuilder:validation:Enum=h2;h2c;tls
+       // +optional
+       Protocol *string `json:"protocol,omitempty"`

        // The policy for load balancing GRPC service requests. Note
        // that the `Cookie` load balancing strategy cannot be used here.

This change has 2 benefits:

We don't have to reject unimplementable Contour Service fields.
We can allow the implementation to vary (it would be weird if Contour Services generated contextually different Envoy config)

However, we now encounter some substantial structural problems:

To generate a ClusterLoadAssignment that weights traffic across multiple Kubernetes Services, we need to know the Kuberenetes Service names and their relative weights in the EndpointsTranslator. There's no way to provide this information to the EndpointsTranslator.
The Envoy Cluster backing the ExtensionService needs to include the ClusterLoadAssignment name. The only way Contour knows that now is because it generates a stable name from a Kubernetes Service, but that doesn't apply here.

stevesloka commented 4 years ago

A DAG Service generates one Envoy Cluster

This should also be clarified that it's also hashed per route. If the same k8s service is used in two different routes, it will end up producing two clusters in Envoy with unique (but similar) names. This was due to weighting that could be different between each use of the service in the ingress document.

To generate a ClusterLoadAssignment that weights traffic across multiple Kubernetes Services, we need to know the Kuberenetes Service names and their relative weights in the EndpointsTranslator. There's no way to provide this information to the EndpointsTranslator.

I think we can do this since the correlation between CDS<>EDS is just a name. Thinking about this problem before, we need to create clusters in CDS based upon a route (or in this case an extension service usage). That unique name then can be used in endpointsTranslator to match the same name. This means that clusters in CDS are no longer mapped to k8s services (or ExtensionService)

The same work needs to be done inside Contour proper since currently if you had two services referenced in a proxy document and all the endpoints in one of the clusters goes unhealthy, Envoy will still route to the cluster. This applies even if the cluster has active health checking enabled.

We don't have to reject unimplementable Contour Service fields.

I like this! However, weren't there other fields that you wanted from contourv1.Service that were to be reused? Or are you looking to add those later when needed? One example was utilizing an externalServiceName and then needing to rewrite headers? I'm not a super big fan of that impl but just wanted to double check.

jpeach commented 4 years ago

A DAG Service generates one Envoy Cluster

This should also be clarified that it's also hashed per route. If the same k8s service is used in two different routes, it will end up producing two clusters in Envoy with unique (but similar) names. This was due to weighting that could be different between each use of the service in the ingress document.

To generate a ClusterLoadAssignment that weights traffic across multiple Kubernetes Services, we need to know the Kuberenetes Service names and their relative weights in the EndpointsTranslator. There's no way to provide this information to the EndpointsTranslator.

I think we can do this since the correlation between CDS<>EDS is just a name. Thinking about this problem before, we need to create clusters in CDS based upon a route (or in this case an extension service usage). That unique name then can be used in endpointsTranslator to match the same name. This means that clusters in CDS are no longer mapped to k8s services (or ExtensionService)

The same work needs to be done inside Contour proper since currently if you had two services referenced in a proxy document and all the endpoints in one of the clusters goes unhealthy, Envoy will still route to the cluster. This applies even if the cluster has active health checking enabled.

I don't think that's implementable. Contour Services bind the header transformations to the forwarding weights and destination. If you set the headers in the HTTP route action, then made a weighted forwarding decision in the cluster, that would break the API guarantees.

We don't have to reject unimplementable Contour Service fields.

I like this! However, weren't there other fields that you wanted from contourv1.Service that were to be reused? Or are you looking to add those later when needed? One example was utilizing an externalServiceName and then needing to rewrite headers? I'm not a super big fan of that impl but just wanted to double check.

Header rewrite only happens in route actions, so you can't do that. Forwarding traffic to external services will use external name in the service.

stevesloka commented 4 years ago

Contour uses the service_name for EDS. If a service is used more than once in a route document, many clusters in CDS will use the same cluster in EDS via the service_name.

I think what we need is a is a lookup in the CDS cache to understand what endpoints are referenced. Something like a map[service_name][]string{<endpoint_name>}, so when an endpoint changes, we can lookup what EDS name we should apply it to when reconciling.

Just thinking out loud on this right now, it's worth a discussion on the next community meeting I think.

{
     "version_info": "1",
     "cluster": {
      "@type": "type.googleapis.com/envoy.api.v2.Cluster",
      "name": "projectcontour-marketing/info/80/da39a3ee5e",  # <---- unique name
      "type": "EDS",
      "eds_cluster_config": {
       "eds_config": {
        "api_config_source": {
         "api_type": "GRPC",
         "grpc_services": [
          {
           "envoy_grpc": {
            "cluster_name": "contour"
           }
          }
         ]
        }
       },
       "service_name": "projectcontour-marketing/info"  # <---- EDS name
      },
      "connect_timeout": "0.250s",
      "common_lb_config": {
       "healthy_panic_threshold": {}
      },
      "alt_stat_name": "projectcontour-marketing_info_80"
     },
     "last_updated": "2020-07-30T21:38:02.324Z"
    },

jpeach commented 4 years ago

Contour uses the service_name for EDS. If a service is used more than once in a route document, many clusters in CDS will use the same cluster in EDS via the service_name.

I think what we need is a is a lookup in the CDS cache to understand what endpoints are referenced. Something like a map[service_name][]string{<endpoint_name>}, so when an endpoint changes, we can lookup what EDS name we should apply it to when reconciling.

Sorry, I don't think I'm following what you mean here. Is there a different way you could explain it?

As I see it, the fundamental problem here is that ClusterLoadAssignments are currently coded to emit endpoints from exactly 1 Kubernetes Service. We need a new concept to represent the weighted Kubernetes Service endpoints behind a single ClusterLoadAssignment, which means major changes to EndpointsTranslator.

stevesloka commented 4 years ago

As I see it, the fundamental problem here is that ClusterLoadAssignments are currently coded to emit endpoints from exactly 1 Kubernetes Service. We need a new concept to represent the weighted Kubernetes Service endpoints behind a single ClusterLoadAssignment, which means major changes to EndpointsTranslator.

Yes this is exactly what I am proposing/discussing.

Today:

Contour allows for an array of Services to be defined on a Route. Like discussed previously, each Kubernetes service in a route maps to a unique cluster in Envoy (CDS). Then all those unique Envoy clusters map to EDS via the service_name.

Future:

Contour no longer maintains a 1:1 mapping of route+service --> Envoy cluster --> Single k8s service endpoints in one EDS cluster.

We need to move to route+service(s) --> Envoy cluster --> Many k8s service endpoints in one EDS cluster using locality based weighting.

This would fix the problem described in this issue along with https://github.com/projectcontour/contour/issues/1119#issuecomment-512645606.

jpeach commented 4 years ago

This would fix the problem described in this issue along with #1119 (comment).

Ah it looks like Dave was moving towards this model in #1119, but we can't complete that change now because it would break the v1 API (e.g. header rewriting per service).

stevesloka commented 4 years ago

Healthchecks were part of the service but was moved up. Could do a similar pattern here (with an API version bump as mentioned).

projectcontour / contour

Auth: Create Envoy clusters from ExtensionServices #2713