Support Ctrl, Data-Routing and Data-User certificates

davidhadas commented 1 year ago

In this PR we add support for separate certificates to:

Control (Autoscaler etc.)
Data Routing (Ingress GW, Activator)
Data User (Queue etc.)

Also added some name-fixing and Deprecation notices on deprecated public items

davidhadas commented 1 year ago

/cc @evankanderson

codecov[bot] commented 1 year ago

Codecov Report

Merging #270 (6a95d59) into main (beb0666) will increase coverage by 0.63%. The diff coverage is 60.60%.

@@            Coverage Diff             @@
##             main     #270      +/-   ##
==========================================
+ Coverage   72.89%   73.52%   +0.63%     
==========================================
  Files          23       23              
  Lines        1380     1394      +14     
==========================================
+ Hits         1006     1025      +19     
+ Misses        317      313       -4     
+ Partials       57       56       -1

Impacted Files	Coverage Δ
pkg/certificates/reconciler/controller.go	`92.85% <ø> (ø)`
pkg/network/tls.go	`0.00% <0.00%> (ø)`
pkg/certificates/reconciler/certificates.go	`69.49% <70.37%> (-0.04%)`	:arrow_down:
pkg/reconciler/tls_dialer_factory.go	`85.36% <100.00%> (ø)`

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

davidhadas commented 1 year ago

/retest

davidhadas commented 1 year ago

/retest

davidhadas commented 1 year ago

cc @ReToCode cc @nak3

ReToCode commented 1 year ago

AFAIK the certificates generated here should be internal to Knative (for now, we have not finished the discussion about certs that are visible/usable to user workloads). You are introducing a lot of new wording (pipeline, edge, data-pipeline-routing...) and concepts (It enables maintaining and distinguishing between multiple data plane pipelines (e.g. of multiple tenants)).

It should not say who is using it, it just indicates that the entity using it is an edge that can be a sender or a receiver of data. It may or may not be a Kservice. It certainly can be a microservice deployed outside of Knative that sends data!

How can it be a non Knative service? The certificates created here are not part of our public API, they are not meant to be used by user-workloads (at least not in the current state).

It can maybe also be any event source, but we will look into that when we analyze eventing...

Shouldn't we do that first and have a consistent story for Eventing + Serving?

davidhadas commented 1 year ago

@ReToCode Please refer to https://docs.google.com/document/d/1XE7UzgQlVVtAb7ULSqOyKCaIHtm8zMF35ainp1JmwyY/edit?resourcekey=0-e1MXRxQaalq-EHW46ZCCWg#heading=h.n8a530nnrb

We have identified three types of certificates needed:

A certificate identifying a Control entity
A certificate identifying an edge entity sending or receiving
A certificate identifying a routing entity that is trusted to route data between edges

@skonto AFAIK (have not tested), if an admin replaces the ca secret, the control protocol will use the ca he created - This may be part of a solution for admins to inject their own ca for serving, but others may have a broader view on that.

ReToCode commented 1 year ago

Please refer to https://docs.google.com/document/d/1XE7UzgQlVVtAb7ULSqOyKCaIHtm8zMF35ainp1JmwyY/edit?resourcekey=0-e1MXRxQaalq-EHW46ZCCWg#heading=h.n8a530nnrb

I do know that document, it has a lot of open discussion items and it seems that we have not yet agreed on a way to provide the certs that are visible/or used to/by user-workload. E.g. we brought up the discussion to have an abstraction for KInternalCertificates for a user to plug in different implementations. E.g. for OpenShift we needs Service Signer, another setup would probably want to use Hashicorp-Vault and so on. We still can provide our own implementation (control-protocol) but then we need to define a clear API for those "visible/used by user-workload" certificates.

We have identified three types of certificates needed: A certificate identifying a Control entity A certificate identifying an edge entity sending or receiving A certificate identifying a routing entity that is trusted to route data between edges

Three certificates makes sense to me, but the document lacks to explain, why we need a sequence-number in the SAN for the data-plane, where it is coming from, what the use-cases are behind it. It mixes a bit with our ongoing discussions about native mTLS and multi-tenancy.

davidhadas commented 1 year ago

@ReToCode As for other certificate solutions that work instead of control-protocol, I think we can define a clear API, but I am not familiar with one that we already have right now. Such API should support a complete solution with mTLS etc.

Three certificates makes sense to me, but the document lacks to explain, why we need a sequence-number > in the SAN for the data-plane, where it is coming from, what the use-cases are behind it. It mixes a bit with > our ongoing discussions about native mTLS and multi-tenancy.

The document indicates "This approach will not prevent a future extension where Knative Serving will deploy multiple shared ingress components (for example one per tenant, or for any other purpose). If multiple ingress components are used, they will each be deployed in a separate namespace and use their own certificate with its own dedicated SAN." - The routing-id is an implementation to enable multiple ingresses, each with dedicated san. For now, we use only one routing-id ("0").

At the same time, this PR is designed to prepare the certificates needed for both TLS and mTLS.

davidhadas commented 1 year ago

/retest

davidhadas commented 1 year ago

/retest

davidhadas commented 1 year ago

/retest

davidhadas commented 1 year ago

/retest

davidhadas commented 1 year ago

/retest

ReToCode commented 1 year ago

The document indicates "This approach will not prevent a future extension where Knative Serving will deploy multiple shared ingress components (for example one per tenant, or for any other purpose). If multiple ingress components are used, they will each be deployed in a separate namespace and use their own certificate with its own dedicated SAN." - The routing-id is an implementation to enable multiple ingresses, each with dedicated san. For now, we use only one routing-id ("0").

At the same time, this PR is designed to prepare the certificates needed for both TLS and mTLS.

Summarising my thought:

I'm fine with the actual code changes, so LGTM on that
I'm also fine with having those changes in control-protocol for now and extract out an abstraction later
I feel like this PR tries to do too much, having multiple SANs for different components is valid at this point, but I feel we do not have properly discussed the whole mTLS or tenancy story.
I feel like the recent changes to the Serving FT were not yet properly discussed and agreed on
I think the mTLS story needs a proper design first (with Eventing, Serving aligned and a good idea how customer workloads chime into it)

davidhadas commented 1 year ago

Summarising my thought:

I'm fine with the actual code changes, so LGTM on that

Does this mean we can lgtm and approve this PR?

I'm also fine with having those changes in control-protocol for now and extract out an abstraction later

This abstraction suppose to be spelled out in https://docs.google.com/document/d/1XE7UzgQlVVtAb7ULSqOyKCaIHtm8zMF35ainp1JmwyY/edit?resourcekey=0-e1MXRxQaalq-EHW46ZCCWg#heading=h.n8a530nnrb

Please help us understand in the doc what is not spelled out.

I feel like this PR tries to do too much, having multiple SANs for different components is valid at this point, but I feel we do not have properly discussed the whole mTLS or tenancy story.

This PR is not adding mTLS, it is just preparing the certificates to make sure each entity can identify itself using the right SAN. This is needed for TLS as well as for mTLS.

mTLS is described in the Doc with diagrams, please help us understand in the doc what is not spelled out.

We did not define a tenancy story yet, and it is not a priority compared to other work we have. The doc specifies that we ensure that if ever wish to maintain more than one instance of routing we can do that. This is implemented in this PR. Besides ensuring that we have a SAN naming mechanism that is able to implement more than one routing instance, there is no any tenancy-related work in this PR.

I feel like the recent changes to the Serving FT were not yet properly discussed and agreed on

Let's then continue to discuss them in meetings, in the doc and on slack.

I think the mTLS story needs a proper design first (with Eventing, Serving aligned and a good idea how customer workloads chime into it)

mTLS is not related to Eventing. It is first and foremost a Serving mechanism to ensure all data is sent via the serving routing and not incoming from obscure sources. This has been the design in Serving for a long time, but it is not yet secured in any way - it will become secured with mTLS.

knative-prow[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidhadas, evankanderson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/knative-sandbox/control-protocol/blob/main/OWNERS)~~ [evankanderson] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

knative-extensions / control-protocol

Support Ctrl, Data-Routing and Data-User certificates #270

Codecov Report