kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.76k stars 715 forks source link

CA rotation: controller-manager needs a separate ca.crt file #1350

Open anguslees opened 5 years ago

anguslees commented 5 years ago

What happened?

I tried to (manually) rotate my cluster's CA key over the weekend. I discovered that /etc/kubernetes/pki/ca.crt can actually include multiple CA keys, and this is key (hah!) to rotating the CA key.

kube-controller-manager however, can only accept a single key in the file pointed to by --cluster-signing-cert-file, since this is the key used to sign things, and not to verify things (so having multiple keys doesn't make sense). kube-controller-manager exits immediately (with a helpful error) if --cluster-signing-cert-file includes multiple keys.

I think pointing kube-controller-manager --cluster-signing-cert-file to ca.crt works for the simple (single key) case, but is incorrect in general, since it prevents ca.crt file from being used to rotate keys. I think the correct path is to either:

What you expected to happen?

Able to append a new CA cert to /e/k/pki/ca.crt and have both CA certs accepted by controller jobs without other impact.

How to reproduce it (as minimally and precisely as possible)?

Append an additional cert to /e/k/pki/ca.crt and restart kube-controller-manager pod

neolit123 commented 5 years ago

cc @liztio @fabriziopandini

fabriziopandini commented 5 years ago

@anguslees thanks for this issue and for the detailed explanation! Certification rotation IMO is a topic that deserve more attention in general, and I think that we should re-iterate at sig level on two points that up to know have no clear prioritization:

  1. certificate rotation enhancements ??? (kubelet, kubeconfig files etc.)
  2. document how certificate rotation works (automatic or DIY)

What discussed in this issue is clearly part of 1; I personally prefer the idea to change the kube-controller-manager, because this will ease the pain on users, but I'm open to reconsider this in the context of the discussion above...

timothysc commented 5 years ago

This is a broader topic and we should loop in @kubernetes/sig-auth-misc on approach + fixes.

liggitt commented 5 years ago

For the immediate question, the signing cert/key given to the controller manager should be distinct from the trust bundle. Coupling the two and having kube-controller-manager treat the first cert in the bundle specially seems ill-advised, since order in trust bundles is not typically significant.

rojkov commented 5 years ago

Signing certs and trust bundles are clearly different things indeed.

So, I tried to separate one from the other in https://github.com/rojkov/kubernetes/commit/afed24fecbfa32d7c939fc86a9f0361e1677824c. Cluster creation (and node join) seems to work. The algorithm for cert renewal is such that new CA certs are added to the bundle and get removed only after their expiration date. I'm not sure how to test upgrade scenarios though.

anguslees commented 5 years ago

Signing certs and trust bundles are clearly different things indeed.

Agreed. To be clear though, the signing happens with a private key, already called out separately (--cluster-signing-key-file). We just need some way to find the corresponding public key (cert). I agree the two are conceptually separate and that the signing cert might not also be in the trust bundle somewhere, but this would be odd in this situation.

(I think we should just use a separate file, as @liggitt suggests and @rojkov's change implements. Just pointing out that we could modify controller-manager to try to find the matching public key in the trust bundle if we really wanted to avoid a separate file)

astrieanna commented 5 years ago

@rojkov it looks like you have already made some good changes for this issue. Are you planning to submit a PR, or interested in help testing your change?

We're interested in this feature as well and would consider working on a similar implementation and PR otherwise. Sounds like the current recommendation is to have a separate trust bundle file and keep the ca.crt file containing the signing credentials.

rojkov commented 5 years ago

@astrieanna I'm consumed by another project ATM, but I'll try to submit a PR by the end of this week.

rojkov commented 5 years ago

@astrieanna it seems a simple rebase is not enough now after @fabriziopandini refactored the cert renewal code. If you need to fix this issue urgently please consider submitting your own PR.

I'll get back to k8s in 2 weeks hopefully.

neolit123 commented 5 years ago

hi, this topic needs a design proposal. deadline for KEPs is done for 1.16 and thus this has to be move to the next cycle.

fabriziopandini commented 5 years ago

As per kubeadm office hours discussion, we are considering certificate rotation in scope of the kubeadm operator https://github.com/kubernetes/kubeadm/issues/1698

neolit123 commented 5 years ago

this is too late for 1.17. /milestone v1.18 also if someone has the time please create a proposal, ideally in google doc and let's discuss it. /help

k8s-ci-robot commented 5 years ago

@neolit123: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes/kubeadm/issues/1350): >this is too late for 1.17. >/milestone v1.18 >also if someone has the time please create a proposal, ideally in google doc and let's discuss it. >/help > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

neolit123 commented 4 years ago

/lifecycle frozen

MikeSpreitzer commented 4 years ago

/remove-lifecycle frozen

I created a PR to work on this, https://github.com/kubernetes/kubernetes/pull/94169 I noticed that this includes no changes to kube-controller-manager, only kubeadm. So I am confused. I thought the discussion here was rooted in a confusion in kube-controller-manager.

neolit123 commented 4 years ago

@MikeSpreitzer like i've mentioned here, ideally before proceeding with changes in kubeadm i'd like to hear from SIG Auth what is the plan for supporting bundles for some of the flags in the KCM.

this can be done is a k/k ticket that tracks this work for sig-auth and makes it more visible.

if the response is "no, some of the flags will never support bundles" we can think about what can be done on the kubeadm side.

neolit123 commented 4 years ago

i'm fine keeping this ticket open until then, and until we clarify the wording about CA rotation as requested here: https://github.com/kubernetes/website/issues/23323

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

anguslees commented 3 years ago

/remove-lifecycle stale

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

neolit123 commented 3 years ago

/remove-lifecycle stale

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

SataQiu commented 3 years ago

/remove-lifecycle stale

Swetad90 commented 3 years ago

For this particular reason (ie change the pointing of the client-ca and cluster-signing-cert to different CA), I'm not able to automate the CA rotation. I thought, I can use a config file that can change the arguments kube-controller-manager accepts, but per the official doc https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/ doesn't show if KCM can accept config files.

I use command line arguments to start a KCM docker container, so changing arguments to point to different certs is not possible while rotating the CA.

neolit123 commented 3 years ago

this is the officially approved guide for rotating CA: https://kubernetes.io/docs/tasks/tls/manual-rotation-of-ca-certificates/

it includes notes about client-ca-file and cluster-signing-cert-file.

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

fabriziopandini commented 2 years ago

/unassign

chenk008 commented 10 months ago

this is the officially approved guide for rotating CA: https://kubernetes.io/docs/tasks/tls/manual-rotation-of-ca-certificates/

it includes notes about client-ca-file and cluster-signing-cert-file.

May I ask why client-ca-file cannot be CA bundles? I think it is only related to KCM https auth.

neolit123 commented 10 months ago

this is the officially approved guide for rotating CA: https://kubernetes.io/docs/tasks/tls/manual-rotation-of-ca-certificates/ it includes notes about client-ca-file and cluster-signing-cert-file.

May I ask why client-ca-file cannot be CA bundles? I think it is only related to KCM https auth.

if you have tested it and it works with bundles, you can PR the same page with a fix. one would expect for it to work with bundles, because it about CAs for a client... it's also documented as such here: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/

similar Q asked here: https://github.com/kubernetes/website/issues/31882

anguslees commented 9 months ago

Oh, I see the doc text points back to this issue as justification for the kube-controller-manager exception ("Issue 1350 for kubeadm tracks an bug with the kube-controller-manager being unable to accept a CA bundle.")

To be clear: This (#1350) is a bug report against kubeadm, not kube-controller-manager. kube-controller-manager always accepted CA bundles in all the right places just fine, but the way kubeadm configured kube-controller-manager was incorrect and conflated CA bundle and CA signing cert. If you don't use kubeadm, or are willing to deviate from the kubeadm-generated config files, then there was never an issue here.

et304383 commented 4 months ago

Oh, I see the doc text points back to this issue as justification for the kube-controller-manager exception ("Issue 1350 for kubeadm tracks an bug with the kube-controller-manager being unable to accept a CA bundle.")

To be clear: This (#1350) is a bug report against kubeadm, not kube-controller-manager. kube-controller-manager always accepted CA bundles in all the right places just fine, but the way kubeadm configured kube-controller-manager was incorrect and conflated CA bundle and CA signing cert. If you don't use kubeadm, or are willing to deviate from the kubeadm-generated config files, then there was never an issue here.

Can you please elaborate? If one used kubeadm previously, how does one move away from a kubeadm generated config file? IE - how would you propose updating in place using a CA bundle?