The "includes" function in HTTPProxy causes bad life-cycle management

uablrek commented 4 years ago

The "includes" function in HTTPProxy causes bad life-cycle management

The "includes" can be used to delegate route definitions to sub-objects;

apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: kahttp-default
spec:
  virtualhost:
    fqdn: kahttp.com
    tls:
      secretName: contour-secret
  includes:
    - name: kahttp-default
    - name: kahttp-admin

In a larger site where route definitions are updates regularly this can be used to allow applications to provide their own route definition and add them to the virtual host when loaded and remove them when un-loaded.

The problem is that the "top" object containing the vhost and the "includes" array must be updated for these operations.

This update operation of the top object requires coordination between otherwise independent applications. To update the top object manually is not feasible in a larger sites.

Use Case

As an operator I want to be able to deploy a applications that adds a route definitions to a vhost with a simple install, e.g. with "helm". When an application is removed it's route definition shall be removed automatically.

As an application owner I want to use "secure backend" to protect my traffic.

Comparison with the Ingress object

K8s allows "Ingress" objects to specify the same vhost in multiple instances;

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: kahttp-default
spec:
  tls:
  - hosts:
    - kahttp.com
    secretName: contour-secret
  rules:
  - host: kahttp.com
    http:
      paths:
      - path: /
        backend:
          serviceName: kahttp-ipv4
          servicePort: 80
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: kahttp-admin
spec:
  tls:
  - hosts:
    - kahttp.com
    secretName: contour-secret
  rules:
  - host: kahttp.com
    http:
      paths:
      - path: /admin
        backend:
          serviceName: kahttp-admin
          servicePort: 80

Contour handles this nicely. An "admin" application can be deployed (and removed) independently and the route to "/admin" is updated automatically. The vhost can set with a value/parameter in a helm install.

But Contour supports backend encryption only for HTTPProxy.

Proposal

Allow to specify the relation in the sub-objects, example;

apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: kahttp-admin
spec:
  virtualhost:
    from:
    - name: kahttp-default
      namespace: default
  routes:
    - conditions:
      - prefix: /admin
      services:
        - name: kahttp-admin
          port: 443
          weight: 1000
          validation:
            caSecret: kahttp-admin-ca
            subjectName: kahttp-admin.com

The "from" field is an array to be compliant with the current implementation where multiple "top" objects can include the same sub-object.

There are of course misconfigurations that must be checked but I leave them for the moment because I can't think of anything unsolvable.

The Canary aspect

A spin-off I find really cool is the elegance of which canary testing can be made with this addition. To test a new kahttp-admin simply install a canary with something like;

apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: kahttp-admin-canary
spec:
  virtualhost:
    from:
    - name: kahttp-default
      namespace: default
  routes:
    - conditions:
      - prefix: /admin
      services:
        - name: kahttp-admin-canary
          port: 443
          weight: 100
          validation:
            caSecret: kahttp-admin-ca
            subjectName: kahttp-admin.com

The canary will grab ~10% of the traffic to "/admin". After a test period the canary can simply be removed and traffic goes back to normal. If the canary is ok the normal backend application can be updated.

uablrek commented 4 years ago

FYI when testing I was impressed that a http connection was "keep-alived" even when the route configuration was altered. Both for multiple Ingress and when updating the "includes:" array in HTTPProxy. So a client will not even lose it's TCP connection if http keep-alive is used.

stevesloka commented 4 years ago

Thanks for the detailed issue @uablrek! We've had some similar discussion in #2189 where folks would like the same type of self-service functionality.

We will have to have some discussions at one of our community meeting calls about this I think to see if there's a mode that Contour could have to enable this, or utilize OPA, or an Admission controller, etc.

The main reason HTTPProxy (and IngressRoute previously) was designed this way was to solve this problem of conflicts, but it does lessen the amount of self-service users might desire in certain environments.

jpeach commented 4 years ago

There are a number of ways to think about security policy and HTTPProxy can support many different policies. Fundamentally, vhosts are global to the cluster, and may even be global to an organization or exposed to the internet. So, we expect that many organizations will need controls around these. Because of this, the HTTPProxy requires that control over the proxy configuration be explicitly delegated from the root to the leaf objects. If we allowed the leaf to attach itself to the vhost, it would be harder to reason about and implement the organizational policy. Most likely, you would need some sort of admission control to ensure that only authorized routes attached themselves to the vhost.

When you create a vhost (i.e. a root HTTPProxy), you can delegate the definition of all the routes if you want to. Using your example above:

---
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: kahttp-default
  namespace: protected
spec:
  virtualhost:
    fqdn: kahttp.com
    tls:
      secretName: contour-secret
  includes:
    - name: kahttp-delegate
       namespace: application

---
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: kahttp-delegate
  namespace: application
spec:
  includes:
    - name: kahttp-default
    - name: kahttp-admin

Now, when you want to use includes to adjust the routes, you only need to edit the kahttp-delegate document, not the root HTTPProxy.

The problem is that the "top" object containing the vhost and the "includes" array must be updated for these operations.

You are right that there are some properties (specifically TLS and vhost policy) that can only me set on the root document. However, because includes can be nested to an arbitrary degree, it is possible to create a document tree where the top document rarely needs to be updated.

uablrek commented 4 years ago

But the problem is that you must alter any object not part of the application when the application is added or removed. To delegate so you don't have to alter the "top" object (defining the vhost) but instead alter the second (common) object in line (or the third,...) does not solve the fundamental problem at all.

For instance in your example if I want to remove the "admin" application, it is not sufficient to just remove it, you must also update the "kahttp-delegate" object which is just as bad as updating the "top" object.

youngnick commented 4 years ago

I'd like to explain what I think you're asking for, @uablrek, please correct me if I'm wrong. Note that I'm not saying that we will do this, just wanting to ensure that I understand what you're asking for.

In the following, I'm using 'vhost' to mean "An FQDN plus any associated TLS config".

Currently, HTTPProxy objects pull config from other HTTPProxy objects into the root document (essentially).

What you would like to be able to do is define a vhost somewhere, and then link to that vhost from an arbitrary number of other objects, the superset of which would define the config for that vhost.

So, the routes pull in the vhost instead of the other way around.

If I'm correct in my understanding, then the questions I have for you are:

What do you think we should do about conflicts? What happens if there are multiple objects that want to attach to a vhost, that have the same route spec?
How would it be possible for the person who owns the domain to stop people from adding routes to the domain in this case?

Both of those questions were major concerns when we designed the current version of HTTPProxy, so I'd really appreciate your thoughts on how they would work in your proposed solution.

uablrek commented 4 years ago

Not exactly. What I request is a way to add an applucation (e.g with helm) that takes traffic from an existing vhost (already used by other applications) without any further configuration steps. And equally important to be able to remove the application without any further configuration steps (no lingering config).

This is the requested function. I think it was a mistake to propose a solution. It might have stopped you from finding a better solution. An example is the "Ingress" object that have this functions since it allows the same vhost to be specified multiple times. This also works fine but there is no support for backend encryption. So another solution might be to allow backend encryption with a annotation in the Ingress object.

uablrek commented 4 years ago

I have not really found and conflicts that does not exist already with includes in a common object. But in general conflicting configs are rejected with some log message.

And many sub-object requesting the same path is not a conflict, it is a feature :smile: See the "Canary aspect" above.

uablrek commented 4 years ago

About malicious application attaching to the vhost I have no solution, I don't know enough about RBAC in K8s which should be the mechanism. But I think that requires rights to do just about anything and there are ways to attack the current HTTPProxy setup also with malicious configs.

youngnick commented 4 years ago

So, your current request is:

as an application owner, I want to be able to add routes that will be associated with an FQDN without having to update any other configuration.
any matching configuration should be transparently merged in to the resulting config across all matches.

Is that correct? I'd like to make sure I have the correct thing before we go any further.

davecheney commented 4 years ago

@uablrek Thank you for this clarification. I feel I owe you an explanation which might help explain our reluctance to engage with your request.

What I request is a way to add an applucation (e.g with helm) that takes traffic from an existing vhost (already used by other applications) without any further configuration steps.

The problem with this feature is while it enables some teams to self service, at the same time it enables other teams to accidentally (or maliciously in the case where the tenants of a cluster are not well trusted -- think CI) stomp on each other's vhost configurations.

I spoke about this a year ago, https://dave.cheney.net/paste/ingress-is-dead-long-live-ingressroute.pdf, sadly I don't think there is a recording.

Put simply, the design of HTTPProxy was directly informed by difficulties our customers had using the existing networking.k8s.io ingress v1beta1 object in real life. HTTPProxy takes as its foundation the notion of delegation that gives the owner, the parent, of a piece of vhost configuration the explicit choice of delegating control to some or all of the configuration to someone else. The model we have in our heads when we discuss this internally is DNS delegation; someone can stand up a DNS server for www.microsoft.com but unless they can convince the owner of .com to change the NS records, their dns server is non canonical.

I want to keep talking to make sure we understand your request, but I do need to make clear that the top down delegation model is inherent to the design of HTTPProxy and is unlikely to change at this point.

tsaarni commented 4 years ago

I guess this is likely something that has been discussed before, but if not - would this approach be feasible:

apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: example-root
spec:
  virtualhost:
    fqdn: example.domain.com
  includes:
  - name: *
    namespace: application-foo-prod

The administrator creates root resource example-root and includes all leafs from namespace application-foo-prod by using * glob.

This is a variant of delegating the definition of all routes which @jpeach mentioned but without the need to edit the includes list every time new HTTPProxy resources are added or removed.

davecheney commented 4 years ago

I don't think this is a good idea. What if someone deploys two copies of the same httpproxy document to that namespace. Prior to this change the one that is not included directly will be ignored, after this change there will be a duplicate route definition and the entire httpproxy document will be declared invalid.

tsaarni commented 4 years ago

True, it would become invalid, but I think similarly two copies of same root HTTPProxy documents will invalidate each other currently. Or explicitly included documents with duplicate routes. On the other hand, if the glob would include documents only from single namespace, problems would be fixable by the team who works in that namespace.

thprice commented 4 years ago

May I add my two cents to this discussion. I actually started out in the same position as @uablrek, looking for an easier way to create/update/delete individual applications. I do understand @davecheney & others' reluctance due to the fact that control over what may be exposed on the domain is delegated to application owners. I'm pretty new to Kubernetes but veteran in access mgmt & control, so please forgive me if I'm too naïve. I would retain the current mechanism where the "upper" layer selectively includes configuration blocks from applications providing the required control over the fqdn. However, I would also apply RBAC to delegate decisions about what sub-paths are available on an fqdn to some application teams (not all, of course). Yes, this would involve some trust and the possibility to "wreak havoc" but it would be explicitly given. The trust would kind of allow to dynamically push an "includes" block to the root HttpProxy object, possibly limited to a specific prefix. This way both kind of scenarios would be possible. In fact, the hierarchy of control could be specified for each cluster, individually. Or even each virtual host. When it comes to conflict resolution my first assumption would be that this is limited to the prefix (please correct me if I'm wrong). I would simply only allow updates to a prefix (or, rather, the relevant includes block) by the same role that created it. Unless an application team explicitly gives up control over their prefix, nobody else can manipulate it. By the way, even in today's setup, an application owner can easily block all traffic to any application - or at least that's what I experienced here. Assume I have a root HttpProxy with two includes for two application teams. If one team deletes its own httpproxy document the whole httpproxy chain becomes invalid and traffic is no longer forwarded, not even to the unchanged application. Yes, it can be easily fixed by removing the relevant include but that's not something the application team can do to reinstate its service. I'm very much interested in your thougts about all this.

youngnick commented 4 years ago

Sorry to take a long time to come back to you @thprice.

There's a few things to cover here. Firstly, we handle partial validity poorly at the moment. This is something I'm working towards fixing by adding Conditions to HTTPProxy (#2642), and then being able to tell you when your proxy is partially valid. This would cover the "two teams, one deletes their include and takes down the other team" case.

With respect to your broader ideas:

The usual mechanism for doing something like you describe is in Kubernetes to have a selector of some sort, either "everything of a type in a namespace", or "everything of a type that matches a set of labels", some combination thereof, or similar. In this case, we'd have to do something like adding a selector of some kind to the includes section.

That would mean that we would end up with a set of conditions (yes, I know, we used the wrong name, but we are stuck with it now), something like this:

...
includes:
  - conditions:
    - prefix: /teama
    namespaceSelector: teama
  - conditions:
    - prefix: /teamb
    namespaceSelector: teamb

That would include all the HTTPProxy objects in namespace teama under the /teama prefix. All fine and dandy for those two teams.

However, what happens when teama has two HTTPProxy objects that have their own identical conditions (either under routes or includes)? Currently, if you have two HTTPProxies that both specify /teama/blog as their prefix, then they would both be rejected by Contour, and neither would work.

If they're not both to be rejected, what can we do?

"First one in stays" is harder than it seems because Kubernetes is declarative, and "first" is poorly defined.
"Load Balance across both" is what Ingress currently does, which also seems surprising.

Sadly, you can't do sub-object RBAC inside Kubernetes, so we can't make it so that only some people can edit the teama include, and some can edit the teamb include.

What we're running up against here is that HTTPProxy was originally designed to cover the use case of there being a centralised infrastructure team that was okay with parcelling out specific, named HTTPProxies to separate teams, that they were free to either use or further subdivide.

Adding in more flexibility is not impossible, but requires careful thought and design.

RichiCoder1 commented 4 years ago

I'd point to Gloo as a good prior art when talking about delegating route configuration: https://docs.solo.io/gloo/latest/guides/traffic_management/destination_types/delegation/

It works pretty darn well in my experience, and IMHO provides the right balance of control between infra controllers and app operators.

For specifically the case of conflicts, they have a pretty sensible strategy outlines for how conflicts are handled:

If a RouteTableSelector matches multiple route tables and the route tables do not specify different weights, Gloo will sort the routes which belong to those tables to avoid short-circuiting (e.g. having a route with a /foo prefix matcher coming before a route with a /foo/bar one). The sorting occurs by descending specificity: routes with longer paths will come first, and in case of equal paths, precedence will be given to the route that defines the more restrictive matchers. The algorithm used for sorting the routes can be found here. In this scenario, Gloo will also alert the user by adding a warning to the status of the parent resource (the one that specifies the RouteTableSelector).

youngnick commented 4 years ago

Thanks @RichiCoder1. I agree that that conflict resolution behavior seems reasonable - it's basically merging the configs, with safeguards around ensuring that all the config actually takes effect.

The main other question there is what happens if there's a hard conflict - remember that HTTPProxy also allows inclusion with non-path matching conditions like headers, so it's possible for one to say "this header must be present" and another to say "this header must not be present", and we need a way to handle that case.

Again, I should be clear that we're not opposed to this in principle, a design needs to be written up.

RichiCoder1 commented 4 years ago

This and Ext Auth support are about the only blockers for us considering Contour, so I'll see if I can do a more formal write up with some options!

youngnick commented 2 years ago

An update on this issue.

HTTPProxy was designed for clusters where a cluster-admin wanted to enforce them being in the loop for changes made to the website. Bascially, its design is fundamentally incompatible with a self-service model without an extra controller being laid over the top to facilitate it.

This problem has been mostly solved in Gateway API with the changing of HTTPRoutes so that they choose the Gateway that they attach to, so I'm going to mark this as closed now - we anticipate having an early cut of Contour 1.20 before the end of the calendar year with Gateway v1alpha2 support available.

Please comment here or in one of our other channels (#contour on Kube Slack, community meetings, etc) if you would like to talk more.

stevesloka commented 2 years ago

We could throw the same logic at HTTPProxy to solve the conflict issues. Change how these resources are processed in a cluster and have immediate success with a very simple, streamlined API.

youngnick commented 2 years ago

If there's a way to solve the fundamental incompatibility I noted above, I'd love to see a write up or design doc about it!

tsaarni commented 2 years ago

I'd be interested. It still applies to our use case and I guess it may take some time for Gateway API CRDs to reach feature parity with the other functionality that we already have in HTTPProxy.

projectcontour / contour