kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.74k stars 712 forks source link

Invalid etcd config generated when using DNS discovery approach #1598

Closed LucaLanziani closed 5 years ago

LucaLanziani commented 5 years ago

Is this a BUG REPORT

Versions

kubeadm version:

&version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:41:54Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

What happened?

When using kubeadm init phase etcd local --config=/root/kubeadmcfg.yaml with the config file containing the following extra argument:

    extraArgs:
      discovery-srv: "example.com"

an invalid configuration is generated and etcd fails to start with the following message:

multiple discovery or bootstrap flags are set. Choose one of "initial-cluster", "discovery" or "discovery-srv"

etcd code that test for this scenario

This is due to the fact that kubeadm adds the initial-cluster argument when generating the config

What you expected to happen?

kubeadm might check if discovery-srv is present in the ExtraArgs and not add the initial-cluster argument if so.

How to reproduce it (as minimally and precisely as possible)?

cat <<EOF >"etcdconfig.yaml"
apiVersion: "kubeadm.k8s.io/v1beta1"
kind: ClusterConfiguration
etcd:
  local:
    extraArgs:
      discovery-srv: "example.com"
EOF

kubeadm init phase etcd local --config=./kubeadmcfg.yaml

Check etcd logs.

Anything else we need to know?

I would be more than happy to contribute with a PR if you think the "expected behavior" is correct.

SataQiu commented 5 years ago

Thanks for reporting this @LucaLanziani :+1: kubeadm does not provide support for discovery-srv or discovery mode at present. Maybe because this https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/etcd/local.go#L212. The code does not handle potentially conflicting parameters for etcd. Local etcd completes cluster discovery based on init-cluster. What are your usage scenarios for discovery-srv mode? However, in my view, if you can support 'discovery-srv', you are welcome to submit PR! /cc @neolit123 @rosti

LucaLanziani commented 5 years ago

Hello @saad-ali :wave: Right now I'm solving this with an hacky workaround, setting init-cluster to null in the extraArgs.

What I'm trying to do is to migrate the monzo implementation of etcd to kubeadm. The etcd nodes case are in multiple autoscaling groups and for the discovery they use a SRV dns record that lists all the etcd peers, the process description can be found on etcd documentation here https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/clustering.md#dns-discovery.

I'll give the PR a try and we might discuss implications on the PR itself if ok with everyone.

neolit123 commented 5 years ago

@LucaLanziani thanks for the issue report. for local etcd, kubeadm tries to provide a common use case that works for most users. arguably discovery-srv enables a use case that is less common.

Right now I'm solving this with an hacky workaround, setting init-cluster to null in the extraArgs.

this actually seems like a pretty viable workaround for now.

in terms of etcd config UX improvements in kubeadm, it's important to note that we do planning on config changes at the beginning of the k8s release cycle. so it's best to not send PRs for this yet.

thanks /assign @fabriziopandini @rosti

fabriziopandini commented 5 years ago

@LucaLanziani Happy to hear new kubeadm users from Italy ! Before expressing an opinion, I would like to better understand your use case. If I got this right you are using an terraform recipe to create an etcd cluster on aws, is that right? And if this is the case, why are you using local etcd instead of external etcd when bootstrapping K8s with kubeadm?

LucaLanziani commented 5 years ago

@neolit123 thanks a lot for the feedback. I agree on the fact that starting an etcd cluster without a discovery mechanism seems to be common.

My idea was to add the discovery-srv behaviour wrapping those lines https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/phases/etcd/local.go#L198-L209 in something like:

_, discovery = extraArgs["discovery"]
_, discoverySrv = extraArgs["discovery-srv"]
if ! (discovery || discroverySrv) {

}

and this would not impact the the current behaviour anyhow imho.

Regarding:

Right now I'm solving this with an hacky workaround, setting init-cluster to null in the extraArgs.

It's definitely working but it generates the following in the manifest:

  - command:                                                                                                                                                                                                       
    - etcd                                                                                                                                                                                                                                                                                                         
    - --initial-cluster=

That is rather confusing for anyone reading the manifest.

More than happy to wait, after all I have it working now.

LucaLanziani commented 5 years ago

Hello @fabriziopandini . I'm starting now on k8s in general and I'm quite a noob, I'll do my best not to make Italy look bad here ;).

Ok, as I pre-announced above I'm still a noob so let me try to see if I got your question right.

you are using an terraform recipe to create an etcd cluster on aws

That's right.

The idea is to use an external etcd, what I'm doing now is spinning up 3 aws instances and starting just etcd there using:

kubeadm init phase etcd local --config=./etcd_local_config.yaml

The next step is to start 3 new aws instances behind a load balancer to run the rest of the master nodes components and in that case I'll setup the etcd part as external with:

kubeadm init --config=/etcd_external_endpoints.yaml

I hope this isn't too confusing.

rosti commented 5 years ago

@LucaLanziani if I understand correctly, you want to have your control plane and etcd cluster on different machines. kubeadm is designed for Kubernetes cluster deployment and etcd is not its specialty. It can deploy "local" or "stacked" etcd clusters in an opinionated way (designed to have an instance on each control plane node, no discovery, ...). I would recommend you to use other tools to setup an external etcd cluster and then come back to kubeadm for the kubernetes cluster itself. One such tool, that I am aware of and has a similar UX to kubeadm is etcdadm.

However, if you continue using kubeadm's phases for etcd cluster creation, you should have the following things in mind:

fabriziopandini commented 5 years ago

@LucaLanziani i think it will help to divide your problem into two parts 1) create an external etcd cluster 2) use kubeadm to create a K8s cluster that uses the external etcd cluster

For 1) you can use terraform/Monzo or whatever solution you prefer. If you want to use kubeadm also for this task, you can follow instructions @ https://kubernetes.io/docs/setup/independent/setup-ha-etcd-with-kubeadm/ , but please be aware of all the considerations explained by @rosti

For two 2) you have to provide kubeadm the exact list of etcd endpoints using the external configuration object; you can find an example @ https://kubernetes.io/docs/setup/independent/high-availability/#set-up-the-first-control-plane-node

However, if you are starting with K8s, you can also consider to let kubeadm manage etcd for you, using the so called stacked etcd mode. More info are available here https://kubernetes.io/docs/setup/independent/ha-topology/

LucaLanziani commented 5 years ago

Hi @rosti you are correct, I have a set of instances running etcd and another set running

kube-scheduler
kube-apiserver
kube-controller-manager

I understand what you are saying regarding kubeadm and I'm also understand the reasons behind the decision of keeping it simple and opinionated.

It's also true that it can load config from an external file using --config. This leaves space to more complex configurations without changing a single line of kubeadm and imho removes the opinionated factor from the equation and that's the only reason why I've opened the issue.

As for your suggestion:

Thanks, I'll evaluate alternatives tools for sure even if currently everything is working fine.

Many things are hard wired, so you should probably keep as close to our opinionated design as possible.

That was exactly my intention and I've changed nothing except of the discovery part.

I don't think, that you need the discovery srv mode as kubeadm announces the new etcd nodes to the existing cluster.

This is true when you are adding nodes but being able to start 3 nodes at the same time without having to hardcore the list of nodes looks pretty nice to me.

etcd is deployed as a static pod and needs kubelet to be running.

:+1:


Just to clarify my position and not send the wrong message:

LucaLanziani commented 5 years ago

Thanks a lot @fabriziopandini for the detailed explanation.

I'll re-read those pages for sure and try to get the best out of them.

  1. create an external etcd cluster

Yes, I have etcd up and running at the moment, including the discovery part, I'll play with it for a little bit and I'll then decide if the static configuration could be a better option.

  1. use kubeadm to create a K8s cluster that uses the external etcd cluster

Yes I'm setting up the master nodes at the moment and in this case I've already generated the external config part with:

etcd:
  external:
    endpoints:
      - https://peer-0..:2379
      - https://peer-1...:2379
      - https://peer-2...:2379

The stacked etcd mode was the first thing me and my team considered but then we decided for the external setup for multiple reasons ;).

neolit123 commented 5 years ago

discovery-srv is not supported for stacked etcd, so we are closing this. please reopen if the issue is still viable.

dawidmalina commented 4 years ago

Currently we are also trying to validate how we can migrate our custom scripts bootstrapping kubernetes + etcd to pure kubeadm approach. We've end up with same issue as @LucaLanziani as we are using discovery-srv for etcd discovery. We are using 3 autoscaling groups (each for particular master node in single az) so we don't need any static configuration and nodes can be recreated anytime (etcd nodes are together with control plane) of course etcd volumes are external.

I can't believed that this suggestion was declined especially that proposed solution was really neutral and didn't forced any opinionated setup. Event more if etcd configuration allows us to provide extraArgs it should support discovery-srv as this is valid etcd argument.

@neolit123 can you please consider it once again and maybe allow us to use more advanced approach but still proposed simple common approach?

Btw etcdadm also did not support discovery-srv :( but even if I would still prefer to keep single tool for all control plane components.

neolit123 commented 4 years ago

@dawidmalina

there is already a workaround for that as noted above:

Right now I'm solving this with an hacky workaround, setting init-cluster to null in the extraArgs.

did you try that?

dawidmalina commented 4 years ago

Hmm I didn't. Looks like I've missed this one. I will check thx 👍

dawidmalina commented 4 years ago

@neolit123 Works like a charm :) Thank you