Support TLS Termination mode for TLSRoutes

Rycieos commented 1 year ago

What would you like to be added:

Currently, the spec for TLSRoute Listeners only supports GatewayTLSConfig.TLSModeType = "Passthrough". Similar to HTTPRoute Listeners, mode "Terminate" should also be supported. Specifically, a Gateway spec of:

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
spec:
  listeners:
  - allowedRoutes:
      kinds:
      - group: gateway.networking.k8s.io
        kind: TLSRoute
    protocol: TLS
    tls:
      mode: Terminate

should be valid and work as expected: which is to strip the TLS layer and pass the TCP traffic to the backend specified by each TLSRoute.

Why this is needed:

I have an application stack that speaks both HTTP as well as a nonstandard application protocol over TCP to backend servers. I want all traffic wrapped in TLS. To allow my developers to iterate quickly, I allow them to create new backend environments at will. To greatly simplify environment creation, I want all HTTPS and TCP traffic to be handled on the same FQDN, meaning the same IP address, meaning the same Gateway object. And to make this whole setup simple, I have set a wildcard DNS record on the Gateway IP as well as a matching wildcard certificate.

For example, my Gateway has a DNS A record set to *.dev.example.com and a wildcard cert to match. Now a developer can create an HTTPRoute and a TLSRoute pointing to their application with the domain foobar.dev.example.com, and it just works. But without this suggested feature, the TLS traffic would not be terminated at the Gateway, meaning the application would need to accept it and present a valid certificate for that domain.

Other options:

As pointed out by @skriss in https://github.com/projectcontour/contour/issues/5461#issuecomment-1587868430, there is apparently a workaround in creating a Listener per domain and using TCPRoutes instead (though I have not tested this). But this negates the benefit of the Gateway API for my use case, which is that I can create a single Gateway with a single DNS record and a single TLS certificate, and allow a developer to route any TLS traffic to any service and at the same time handle the TLS termination so the application does not need to.

Other notes:

Traefik's Gateway API implimentation (while flawed and not spec compliant in many ways) does support TLSRoute TLS termination. I am using it currently for my above use case, and it works quite well.

candita commented 1 year ago

is this the same as TLS Use Case 3 in @youngnick's document: https://docs.google.com/document/d/17sctu2uMJtHmJTGtBi_awGB0YzoCLodtR6rUNmKMCs8/edit#heading=h.prttql3ho2g6?

Rycieos commented 1 year ago

is this the same as TLS Use Case 3 in @youngnick's document

It would appear so, yes. I am not sure why it would use a TCPRoute vs a TLSRoute, or what the difference would even be, but I guess that's implementation details that I do not care about.

candita commented 1 year ago

is this the same as TLS Use Case 3 in @youngnick's document

It would appear so, yes. I am not sure why it would use a TCPRoute vs a TLSRoute, or what the difference would even be, but I guess that's implementation details that I do not care about.

If so, then maybe you can already do this with TCPRoute (https://gateway-api.sigs.k8s.io/concepts/api-overview/#tcproute-and-udproute) and it may be a matter of working on moving it out of the Experimental Channel.

Rycieos commented 1 year ago

maybe you can already do this with TCPRoute (https://gateway-api.sigs.k8s.io/concepts/api-overview/#tcproute-and-udproute)

That doc says:

each TCPRoute really needs a different port on the listener (in general, anyway).

Which makes my use case impossible. A TCPRoute does not have a hostnames field like a TLSRoute does which would allow for matching and routing based on SNI. So I guess that use case 3 in that document isn't exactly a match, as I need this to work with TLSRoutes.

youngnick commented 1 year ago

Okay, thanks for the clarifications. Let me write out what seem to be the critical parts to me and see if we have this right:

There should be one Listener that accepts TLS connections on some port (let's say 443 for argument purposes)
That Listener will have both TLSRoutes and HTTPRoutes attached to it.
The TLSRoutes and HTTPRoutes will all have a hostname set, and the hostname will be used to choose where particular traffic will flow, based on the SNI of the TLS handshake.
For TLSRoutes, the TLS connection should be terminated, and the TCP traffic forwarded directly to the backend, unencrypted. (This is why we've talked about this as a TCPRoute previously, because it's forwarding a TCP stream).
For HTTPRoutes, the TLS connection should be terminated and the HTTP traffic forwarded as directed by the HTTPRoute spec.

The critical part here is that the routing discriminator that allows the implementation to decide which Route should get the traffic is the hostname field in the HTTPRoute or TLSRoute, which must be matched against the SNI (and which will need to match the usual Listener->Route hostname rules as well).

If that's correct, then this is a bit new, I don't think we've really specced out rules for what happens when different types of Route are attached to the same Listener.

I'll hold off on further comments until @Rycieos can confirm if I've got the bones correct.

Rycieos commented 1 year ago

@youngnick that is all correct, except for:

That Listener will have both TLSRoutes and HTTPRoutes attached to it.

That is one valid use case, but I can imagine two more:

(restating your use case): One Listener that accepts both TLSRoutes and HTTPRoutes, on the same port, but different hostnames.
Two Listeners, on different ports: one that accepts TLSRoutes and one that accepts HTTPRoutes, and can have the same hostnames.
One Listener that accepts both TLSRoutes and HTTPRoutes, on the same port, and can have the same hostnames.

I think all three use cases are valid, but 3) would be hard to implement. You would need a heuristic to identify HTTP layer 5 traffic to know if the incoming traffic is HTTP or something else. I think leaving it out of the spec somehow is valid.

But both of the other two uses cases should be supported in my opinion. You are correct that the first use case is something new, and is probably not simple to implement. The second use case is much simpler, and why I opened this issue, as it is only asking for TLSRoutes to work exactly the same as HTTPRoutes in regards to Listeners, bindings, and TLS termination.

My specific user story is for the second use case. I'll mention again that Traefik already has support for this use case in their TLSRoute implementation, I would assume because, like me, they misunderstood the spec and thought it was already required.

youngnick commented 1 year ago

Thanks for that @Rycieos, that's great clarification.

In use case 2, are there multiple TLSRoutes attached to the "forward TCP traffic" Listener? That would be the reason to use TLSRoute and not TCPRoute, I think.

If that's the case, that's definitely a reason to allow for this, but I think that we may need to add this as an Extended feature (meaning that not every implementation has to do it) - this would mean that Traefik (and any other implementations that want to) can optionally support the feature, and we'll have conformance tests that verify that things work like expected.

Assuming that we're in agreement about the above, I think we can keep the change tightly scoped to the second use case by saying something like "Using TLSRoutes with TLS mode Terminate is okay, as long as they're the only type of Route attached to that Listener" - that would allow us to rule out the mixed-Route use case, which seems like it would be hard.

That said, it's probably worth getting some feedback about that last point from other implementers - I think that @howardjohn is the most likely to have had to do something like this, but I'd like to hear from anyone about if the mixed-Route-types use case is something we should consider supporting.

Rycieos commented 1 year ago

In use case 2, are there multiple TLSRoutes attached to the "forward TCP traffic" Listener? That would be the reason to use TLSRoute and not TCPRoute, I think.

Correct. Here is an example: DNS A *.example.com. <GATEWAY_IPV4_LB_IP>

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
spec:
  listeners:
  - allowedRoutes:
      kinds:
      - group: gateway.networking.k8s.io
        kind: TLSRoute
    protocol: TLS
    tls:
      mode: Terminate
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
spec:
  hostnames:
  - foo.example.com
  rules:
  - backendRefs:
    - kind: Service
      name: foo
      port: 3102
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
spec:
  hostnames:
  - bar.example.com
  rules:
  - backendRefs:
    - kind: Service
      name: bar
      port: 3102

I think that we may need to add this as an Extended feature (meaning that not every implementation has to do it)

That is fine with me, though I would argue that allowing support for TLSRoutes without this feature is pretty much worthless. If we could somehow make this feature a Core feature of the Extended TLSRoute resource, that would be my vote. Unless TLSRoutes were planned to be Core at some point, then it's probably fine.

I think we can keep the change tightly scoped to the second use case

Again fine with me, as my use case fits inside this simple solution.

sunjayBhatia commented 1 year ago

/triage

sunjayBhatia commented 1 year ago

/needs-triage

sunjayBhatia commented 1 year ago

/label needs-triage

k8s-ci-robot commented 1 year ago

@sunjayBhatia: The label(s) /label needs-triage cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to [this](https://github.com/kubernetes-sigs/gateway-api/issues/2111#issuecomment-1644431970): >/label needs-triage Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

sunjayBhatia commented 1 year ago

@shaneutt @robscott we're looking at this in Contour so would love if we got consensus upstream, could we add the needs-triage label etc. to get this on the triage radar?

robscott commented 1 year ago

🤔 I'm not actually sure how to do that with prow. I guess in upstream most issues start with needs-triage, can't find an obvious way in https://prow.k8s.io/command-help, but maybe there's another bot we can enable to match upstream behavior here. In any case, adding the label manually for now.

wuxingzhong commented 10 months ago

I have the same problem and I support @Rycieos . Is there a plan to support it?

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Rycieos commented 6 months ago

This enhancement is still very much something I need (and am currently using).

/remove-lifecycle rotten

shaneutt commented 6 months ago

Hi @Rycieos :wave:

I see you changed the lifecycle on this one, it might be worth putting this on the agenda for an upcoming sync to discuss this one because I have yet to see anyone who's looking to step forward and implement this and maybe talking through it could help give it some gas :thinking:

Rycieos commented 6 months ago

Thanks @shaneutt. I agree, this hasn't had much traction. I added it to the meeting agenda; hopefully I can make the next meeting to discuss it.

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

youngnick commented 2 months ago

/remove-lifecycle rotten

I think we'll still need to handle this type of usecase, even though we haven't touched it for a while.

shaneutt commented 2 months ago

/cc @mlavacca @candita

kubernetes-sigs / gateway-api

Support TLS Termination mode for TLSRoutes #2111