kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.75k stars 716 forks source link

generate-csr should support worker nodes #2413

Closed robbiemcmichael closed 3 years ago

robbiemcmichael commented 3 years ago

Summary

This is a summary of the feature request after some design discussion in the comments below.

External CA mode could be better supported for worker nodes by changing the generate-csr command to support the following modes:

kubeadm certs generate-csr all             # generate all CSRs and kubeconfig files
kubeadm certs generate-csr kubelet(.conf)  # generate just the kubelet CSR and the kubelet.conf file

The name for the sub-command could be either kubelet or kubelet.conf. To me it seems that kubelet is more consistent with the existing kubeadm init phase kubeconfig kubelet command.

Changing the original generate-csr command to require the all sub-command would be a breaking change. This is on hold unless there is further interest from others in this feature.

Original request

This is a feature request that kubeadm certs generate-csr (or a new similar command) should support worker nodes. Currently this command seems to be built with the assumption that it's being executed on a control plane node.

Goal

I'm trying to implement a declarative join process for worker nodes when using an external CA. By this I mean that I cannot sensibly generate a bootstrap token from the control plane nodes and get it to the worker node, so instead I'm using file based discovery with an out-of-band process for signing certificates.

The workflow I would like to achieve is the following:

  1. Worker node boots with a kubeadm config file containing just the JoinConfiguration and KubeletConfiguration (I believe these are all that we should need for a worker node)
  2. kubeadm certs generate-csr is used to generate the private key and CSR for kubelet.conf
  3. An out-of-band process updates kubelet.conf to embed the private key and certificate after signing it with the external CA (e.g. this can be done with the Vault PKI API)
  4. Run kubeadm join phase kubelet-start which uses file based discovery to connect to the Kubernetes API and fetch ClusterConfiguration from a configmap
  5. Worker node has joined the cluster without using a bootstrap token

Problem

This very nearly works already, I've just hit a few hiccups:

  1. I can't run kubeadm certs generate-csr without the kubeadm config file having an InitConfiguration or ClusterConfiguration. These appear to be control plane node configuration files to me, so they shouldn't need to be included on worker nodes just to be able to generate CSRs.
  2. I end up with CSRs for all of the control plane components. Not a problem per se, just a bit unnecessary so it would be nice to be able to generate the CSR just for kubelet.conf.
  3. I need to manually override the server in kubelet.conf to point it at the API server address for the control plane since by default the the kubeconfig files contain the IP of the local node. This makes sense for control plane nodes so that components only connect to the local API server, but obviously it doesn't work for a worker node where the API server isn't running.

Right now it's easier easier just to avoid generating the CSR and manually generate the certificate with the correct fields, but I would like to be able to generate the CSR for a worker node so I can remove this code and use the same process that I use for control plane nodes.

To address the issues above, I'd like to add a way to generate the CSR which doesn't require any control plane configuration files, only generates the CSR for kubelet.conf and generates the kubelet.conf with the .clusters[0].cluster.server field pointing to the control plane API server.


Please let me know if you think there are any better ways of accomplishing the same goals. I'm also happy to do the implementation if this idea gains any support. It's probably only worth doing if you think it would be useful for anyone else since I already have a workaround.

neolit123 commented 3 years ago

This is a feature request that kubeadm certs generate-csr (or a new similar command) should support worker nodes. Currently this command seems to be built with the assumption that it's being executed on a control plane node.

yes, basically kubeadm certificates are managed on the init node and that's what certificate / CSR commands are targetting. the UX on joining nodes is oriented towards using a simple configuration / keys / tokens. this is by design and it satisfies the majority of use cases.

Worker node boots with a kubeadm config file containing just the JoinConfiguration and KubeletConfiguration (I believe these are all that we should need for a worker node)

note, currently we don't support KubeletConfiguration on join but we might be adding it soon. https://github.com/kubernetes/kubeadm/issues/1682

The workflow I would like to achieve is the following:

can you explain why you need a CSR for the kubelet client certificate? (EDIT: ok i saw that you just want consistency with the rest of your setup)

i think you can simplify this if you have a look at our e2e test for external CA: https://github.com/kubernetes/kubeadm/blob/05f9b96cf83d73ed2177815487eba90517b66708/kinder/pkg/cluster/manager/actions/setup-external-ca.go#L28 https://github.com/kubernetes/kubeadm/blob/master/kinder/ci/workflows/external-ca-tasks.yaml

what happens is:

I can't run kubeadm certs generate-csr without the kubeadm config file having an InitConfiguration or ClusterConfiguration. These appear to be control plane node configuration files to me, so they shouldn't need to be included on worker nodes just to be able to generate CSRs.

this is by design as mentioned above.

I end up with CSRs for all of the control plane components. Not a problem per se, just a bit unnecessary so it would be nice to be able to generate the CSR just for kubelet.conf.

this is something that we can optimize to be similar to the certs command which allows asking for specific certs.

I need to manually override the server in kubelet.conf to point it at the API server address for the control plane since by default the the kubeconfig files contain the IP of the local node. This makes sense for control plane nodes so that components only connect to the local API server, but obviously it doesn't work for a worker node where the API server isn't running.

in our external CA e2e we just do kubeadm init phase kubeconfig kubelet --control-plane-endpoint=%s, but we don't generate via CSRs. i think generate csr would allow you to configure this using the ClusterConfiguration?

Please let me know if you think there are any better ways of accomplishing the same goals. I'm also happy to do the implementation if this idea gains any support. It's probably only worth doing if you think it would be useful for anyone else since I already have a workaround.

given you are the first user demanding such a change i would hold until there is more demand. and given there are a number of workarounds to support external CA / joining without tokens.

neolit123 commented 3 years ago

cc @fabriziopandini @wallrj

robbiemcmichael commented 3 years ago

Thanks for the detailed reply @neolit123.

  • you can generate a kubelet.conf directly on the joining nodes using: kubeadm init phase kubeconfig kubelet --control-plane-endpoint=%s (init does require InitConfiguration / ClusterConfiguration or you can use some of the flags like --control-plane-endpoint) or generate them externally but pass InitConfiguration.NodeRegistrationOptions.Name to be explicit about the node name in the cert.

I just gave this a shot and using the flags does seem to resolve the need for a config file. However, I encountered an error because it tries to load the API server certificate which doesn't exist on a worker node:

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:08:27Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

$ ls /root/pki/
ca.crt

$ kubeadm init phase kubeconfig kubelet --cert-dir=/root/pki --control-plane-endpoint=cluster.local --kubernetes-version=v1.20.5
invalid or incomplete external CA: failure loading certificate for API server: failed to load certificate: couldn't load the certificate file /root/pki/apiserver.crt: open /root/pki/apiserver.crt: no such file or directory
To see the stack trace of this error execute with --v=5 or higher

Stack trace:

$ kubeadm init phase kubeconfig kubelet --cert-dir=/root/pki --control-plane-endpoint=cluster.local --kubernetes-version=v1.20.5 --v=5
I0319 01:29:08.648316  310508 initconfiguration.go:104] detected and using CRI socket: /run/containerd/containerd.sock
I0319 01:29:08.648790  310508 interface.go:400] Looking for default routes with IPv4 addresses
I0319 01:29:08.648818  310508 interface.go:405] Default route transits interface "vlan0"
I0319 01:29:08.649770  310508 interface.go:208] Interface vlan0 is up
I0319 01:29:08.649920  310508 interface.go:256] Interface "vlan0" has 1 addresses :[10.0.0.128/27].
I0319 01:29:08.649985  310508 interface.go:223] Checking addr  10.0.0.128/27.
I0319 01:29:08.650008  310508 interface.go:230] IP found 10.0.0.128
I0319 01:29:08.650033  310508 interface.go:262] Found valid IPv4 address 10.0.0.128 for interface "vlan0".
I0319 01:29:08.650055  310508 interface.go:411] Found active IP 10.0.0.128 
I0319 01:29:08.659002  310508 certs.go:474] validating certificate period for CA certificate
I0319 01:29:08.659323  310508 certs.go:474] validating certificate period for API server certificate
open /root/pki/apiserver.crt: no such file or directory
couldn't load the certificate file /root/pki/apiserver.crt
k8s.io/kubernetes/cmd/kubeadm/app/util/pkiutil.TryLoadCertFromDisk
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/pkiutil/pki_helpers.go:249
k8s.io/kubernetes/cmd/kubeadm/app/util/pkiutil.TryLoadCertAndKeyFromDisk
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/pkiutil/pki_helpers.go:230
k8s.io/kubernetes/cmd/kubeadm/app/phases/certs.validateSignedCertWithCA
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/certs/certs.go:422
k8s.io/kubernetes/cmd/kubeadm/app/phases/certs.validateSignedCert
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/certs/certs.go:416
k8s.io/kubernetes/cmd/kubeadm/app/phases/certs.UsingExternalCA
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/certs/certs.go:342
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newInitData
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:373
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func3
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:195
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).InitData
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:183
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:203
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).BindToCommand.func1.1
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:347
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
    _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
    /usr/local/go/src/runtime/proc.go:204
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1374
failed to load certificate
k8s.io/kubernetes/cmd/kubeadm/app/util/pkiutil.TryLoadCertAndKeyFromDisk
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/pkiutil/pki_helpers.go:232
k8s.io/kubernetes/cmd/kubeadm/app/phases/certs.validateSignedCertWithCA
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/certs/certs.go:422
k8s.io/kubernetes/cmd/kubeadm/app/phases/certs.validateSignedCert
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/certs/certs.go:416
k8s.io/kubernetes/cmd/kubeadm/app/phases/certs.UsingExternalCA
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/certs/certs.go:342
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newInitData
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:373
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func3
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:195
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).InitData
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:183
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:203
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).BindToCommand.func1.1
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:347
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
    _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
    /usr/local/go/src/runtime/proc.go:204
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1374
failure loading certificate for API server
k8s.io/kubernetes/cmd/kubeadm/app/phases/certs.validateSignedCertWithCA
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/certs/certs.go:424
k8s.io/kubernetes/cmd/kubeadm/app/phases/certs.validateSignedCert
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/certs/certs.go:416
k8s.io/kubernetes/cmd/kubeadm/app/phases/certs.UsingExternalCA
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/certs/certs.go:342
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newInitData
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:373
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func3
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:195
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).InitData
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:183
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:203
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).BindToCommand.func1.1
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:347
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
    _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
    /usr/local/go/src/runtime/proc.go:204
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1374
invalid or incomplete external CA
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newInitData
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:378
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func3
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:195
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).InitData
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:183
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:203
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).BindToCommand.func1.1
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:347
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
    /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
    _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
    /usr/local/go/src/runtime/proc.go:204
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1374

Quick links to what I think are the relevant lines from the stack trace:

https://github.com/kubernetes/kubernetes/blob/6b1d87acf3c8253c123756b9e61dac642678305f/cmd/kubeadm/app/phases/certs/certs.go#L342

https://github.com/kubernetes/kubernetes/blob/6b1d87acf3c8253c123756b9e61dac642678305f/cmd/kubeadm/app/cmd/init.go#L373

It looks like this command currently also only works on a control plane node. I'll have a go today at building a custom version with some of these checks removed to see if I can find any issues running this on a worker node.

neolit123 commented 3 years ago

it shouldn't happen if you have both the ca.key and ca.crt on the worker. then again...having the ca.key on worker nodes even if temporarily may be a bad idea depending of how they are provisioned.

you could instead create the kubelet.conf for workers on a trusted node (e.g. the one that does "init") and not move the ca.key to workers.

robbiemcmichael commented 3 years ago

We're using external CA mode so ca.key isn't present on the control plane nodes either.

Are you open to changes that would allow kubeadm init phase kubeconfig kubelet --control-plane-endpoint=%s to be executed on a worker node as long as it doesn't remove any of the validations for control plane nodes?

neolit123 commented 3 years ago

We're using external CA mode so ca.key isn't present on the control plane nodes either.

the kubeadm binary can be executed on out-of-cluster machines where the CA key is present and the kubelet.conf files for different nodes to join can be prepared there.

Are you open to changes that would allow kubeadm init phase kubeconfig kubelet --control-plane-endpoint=%s to be executed on a worker node as long as it doesn't remove any of the validations for control plane nodes?

signing a client certificate and key (e.g. like those present in the kubelet.conf) requires a CA key, so even if we remove the validation when this command is run on worker nodes, there is no private key to sign the client pair.


something i forgot to mention is that once a kubelet bootstraps to the cluster the kubelet.conf must be updated to point to the rotatable cert / key:

client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

kubeadm init phase kubelet-finalize can be used for that and then the kubelet must be restarted again.

robbiemcmichael commented 3 years ago

We're using external CA mode so ca.key isn't present on the control plane nodes either.

the kubeadm binary can be executed on out-of-cluster machines where the CA key is present and the kubelet.conf files for different nodes to join can be prepared there.

We're using the PKI engine in Vault which doesn't allow the private key to be exported (except at generation, but we intentionally don't export it at that point for security reasons) so unfortunately we can't do this either.

Are you open to changes that would allow kubeadm init phase kubeconfig kubelet --control-plane-endpoint=%s to be executed on a worker node as long as it doesn't remove any of the validations for control plane nodes?

signing a client certificate and key (e.g. like those present in the kubelet.conf) requires a CA key, so even if we remove the validation when this command is run on worker nodes, there is no private key to sign the client pair.

Our nodes have the ability to issue client certificates for their own identity, so we can get a client certificate even though the nodes don't have the CA key to be able to sign the certificates themselves.

The process would go like this, all steps are executed on the node and work without an external orchestrator and the only thing that is needed is a secret which allows the node to auth with Vault:

  1. Use kubeadm to generate kubelet.conf without embedding the client certificate as it doesn't exist yet (ideally kubeadm could generate the CSR here as it does with kubeadm certs generate-csr)
  2. Use the Vault API to issue a client certificate for the node's own identity and then embed this certificate in kubelet.conf
  3. Run kubeadm join to join the cluster

I'm gathering from your latest comment that kubeadm init phase kubeconfig kubelet shouldn't be used unless the CA private key is present in the execution environment.

I think the cleanest solution might be going back to my original idea of extending kubeadm certs generate-csr to be able to generate just a kubelet.conf file and the associated CSR so that it can be used for worker nodes. This command for generating the kubelet.conf file should succeed without a kubeadm config file as long as the --control-plane-endpoint flag is specified.

One problem is that I'm not sure how this could be supported in a backwards compatible way since changing the command to generate-csr all to make room for generate-csr kubelet is a breaking change.

neolit123 commented 3 years ago

We're using the PKI engine in Vault which doesn't allow the private key to be exported (except at generation, but we intentionally don't export it at that point for security reasons) so unfortunately we can't do this either.

understood, there are a number of ways to keep the CA not present on the nodes.

I'm gathering from your latest comment that kubeadm init phase kubeconfig kubelet shouldn't be used unless the CA private key is present in the execution environment.

that is true. it is an internal phase used for setuping the primary node kubelet only if the CA cert/key are present on the node. if the CA is not present there, the user must sign the client cert/key for a kubelet.conf using an external process.

I think the cleanest solution might be going back to my original idea of extending kubeadm certs generate-csr to be able to generate just a kubelet.conf file and the associated CSR so that it can be used for worker nodes.

going back to the CSR idea seems better. generating just the CSR for the kubelet.conf is tricky as you mentioned since we need to trick Cobra (our CLI library) into allowing:

kubeadm certs generate-csr # generate all CSRs
kubeadm certs generate-csr kubelet.conf # generate just the kubelet.conf

IIRC, this was not possible and we had to add: kubeadm certs generate-csr all, which is a breaking change as it now needs all..

This command for generating the kubelet.conf file should succeed without a kubeadm config file as long as the --control-plane-endpoint flag is specified.

like i've mentioned the CSR command is designed for CP nodes and therefore it will continue requiring a ClusterConfiguration under the hood. we would like to not add more flags to commands that are already present in the config such as control-plane-endpoint. this applies to all kubeadm sub-commands.

the minimal config you need to pass to generate-csr on workers is:

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
controlPlaneEndpoint: some-dns-name-or-ip
robbiemcmichael commented 3 years ago

Thanks for all the advice @neolit123. Since this feature request would require a breaking change to the generate-csr command, I think it might be best to park this until there are enough people asking for the feature to justify the breaking change. I'll continue with my workaround for the time being.

neolit123 commented 3 years ago

let's close this and create a new issue with exact details if users request this breaking change in the future:

kubeadm certs generate-csr # generate all CSRs
kubeadm certs generate-csr kubelet.conf # generate just the kubelet.conf

IIRC, this was not possible and we had to add: kubeadm certs generate-csr all, which is a breaking change as it now needs all..

gopaltirupur commented 1 year ago

Is it possible for somebody to update the new ticket created to track this, Thanks