kubernetes-retired / cluster-api-bootstrap-provider-kubeadm

LEGACY REPO. NEW CODE IS https://github.com/kubernetes-sigs/cluster-api/tree/master/bootstrap/kubeadm
Apache License 2.0
62 stars 67 forks source link

Should not error if worker node does not have JoinConfiguration #213

Closed chuckha closed 4 years ago

chuckha commented 4 years ago

/kind bug

https://github.com/kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm/blob/cc2b1e48885f2f7bda4b8a3048b1c33367322a0f/controllers/kubeadmconfig_controller.go#L258-L260

What steps did you take and what happened: After the control plane is initialized and a worker gets created but that worker does not have a JoinConfiguration the reconciler returns an error.

What did you expect to happen: I expected a nil join configuration would be absolutely fine in this case. The reconciler should imply set it to an initialized JoinConfiguration. If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

detiber commented 4 years ago

If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

Should we do this though? Maybe we should error if a Cluster or Init config is provided rather than ignoring?

detiber commented 4 years ago

I expected a nil join configuration would be absolutely fine in this case. The reconciler should imply set it to an initialized JoinConfiguration.

Yes, please

detiber commented 4 years ago

I'm also wondering if we should default Cluster/Init configuration if one is not provided for the initial control plane instance?

chuckha commented 4 years ago

If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

Should we do this though? Maybe we should error if a Cluster or Init config is provided rather than ignoring?

Yeah, I think this was designed to allow the user to be quite lazy with their configuration. I don't have strong feelings on omitting a log or straight up erroring.

@fabriziopandini any thoughts here?

fabriziopandini commented 4 years ago

@chuckha @detiber

I expected a nil join configuration would be absolutely fine in this case. The reconciler should imply set it to an initialized JoinConfiguration.

I'm ok with this (see also this issue https://github.com/kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm/issues/112)

I'm also wondering if we should default Cluster/Init configuration if one is not provided for the initial control plane instance?

If we are going to generate a default JoinConfiguration, I don't see strong reasons for generating default Cluster/Init configuration as well

If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

Should we do this though? Maybe we should error if a Cluster or Init config is provided rather than ignoring?

I'm personally in favor of strict validation, but lazy gives you the possibility to use the same configuration for all your control plane nodes vs having one configuration for the first control plane and another configuration for the additional control plane nodes. As a side note, also in kubeadm we reverted to lazy because people were using a single file with all the objects for all the nodes.

Ok for adding a warning when we detect lazy behaviors

PS. if we get to an agreement, I would like to work on this 😉

chuckha commented 4 years ago

That's really good to know.

I'm ok with allowing all control planes to be defined with Init and Cluster configurations, mostly because that's what Kubeadm does.

However, if they are not the initial control plane then we should generate the JoinConfiguration for them.

Sorry @fabriziopandini I'm a little confused about this line:

If we are going to generate a default JoinConfiguration, I don't see strong reasons for generating default Cluster/Init configuration as well

Are you in favor of generating default Cluster/Init configuration if it does not exist for the first found control plane node?

detiber commented 4 years ago

I'm personally in favor of strict validation, but lazy gives you the possibility to use the same configuration for all your control plane nodes vs having one configuration for the first control plane and another configuration for the additional control plane nodes. As a side note, also in kubeadm we reverted to lazy because people were using a single file with all the objects for all the nodes.

My major worry here is not that someone provides the same configuration to all Machines, but rather that they provide a different configuration to an additional Machine and expect that it would mutate the config of the existing Cluster.

chuckha commented 4 years ago

@detiber that is a good point. Different configurations would not be good.

Without adding aggregation/ensuring that the configs are the same, the only thing we can really do here is fail on secondary control planes with Init/Config. I'm ok with this approach.

fabriziopandini commented 4 years ago

So, if I got this right,

detiber commented 4 years ago

@fabriziopandini that and:

Is my understanding, at least.

fabriziopandini commented 4 years ago

/lifecycle active

chuckha commented 4 years ago

/assign @fabriziopandini

accepting commented 4 years ago

I have a suggestion here.

What do you think of this suggesion@fabriziopandini @chuckha @detiber

fabriziopandini commented 4 years ago

@accepting please correct me if I'm wrong, but I'm not sure what are you proposing work in use cases different than yours where all the KubeadmConfig have Init/Cluster/Join configuration.

fi. if the user is providing: a) a KubeadmConfig object specifically configured for init b) a KubeadmConfig object specifically configured for joining a second control plane

If b is processed as first:

then, when a is processed:

This is not correct, because CABPK has to respect user-provided Init/Cluster configuration in a and JoinCongiration in b.

chuckha commented 4 years ago

I think this comes down to how control planes are configured.

If we give the same InitConfig, ClusterConfig and JoinConfig to every machine and mark some as control planes then the configmap lock prevents multiple kubeadm init calls. The first one that gets the lock runs a kubeadm init. In this case, once we have the lock, we set the Join configuration to nil and move on. The next reconcile loop the following control plane that succeeds will be a control plane join. In this case we set the init and cluster configuration to nil and move on.

However, as @fabriziopandini points out, if we give all configurations to the control plane and specifically configure one config as a join but leave the init and cluster configuration there will be a problem.

I wonder what makes sense here? I hadn't accounted for the first use case, but that is a very SIG-cluster-lifecycle way to use cluster-api. Machines are treated as cattle. I'd almost say that if a user wants to specifically create a machine that is the bootstrap control plane then they should mark the it as such by setting the JoinConfiguration to nil.

@fabriziopandini do you think it's acceptable that if a user wants a specific machine to be a primary control plane they must configure it as such by setting JoinConfiguration to nil and if they want a specific instance to be a secondary control plane they must configure as such by string Init and Cluster configurations to nil?

ncdc commented 4 years ago

Are there use cases where it makes sense to distinguish between control plane node 1 vs all other control plane nodes?

vincepri commented 4 years ago

Just chiming in a little here. I'd prefer to keep things explicit and avoid making users have to understand the way controllers work internally. A possible solution could be:

chuckha commented 4 years ago

@ncdc I haven't heard of any but I also don't have many points of data.

I'm in favor of being lenient to allow the "copy configs across every machine" since that is actually easier to implement and would help @accepting out, but I want to do a little more questioning of users to see how people are actually using this

chuckha commented 4 years ago

let's not change behavior right now and have a discussion about behavior going forward so we can come to some agreement before implementing this change.