kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.99k stars 4.65k forks source link

kops can't create cluster with etcd v2 #6858

Closed dee-kryvenko closed 5 years ago

dee-kryvenko commented 5 years ago

1. What kops version are you running? The command kops version, will display this information.

1.11.x

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

1.10.x

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

export KOPS_FEATURE_FLAGS=SpecOverrideFlag
kops create cluster bla bla --networking calico --override 'cluster.spec.etcdClusters[*].version=2.2.1' --override 'cluster.spec.etcdClusters[*].provider=legacy'

5. What happened after the commands executed?

It fails with

spec.networking.Calico.MajorVersion: Invalid value: "v3": Unable to use v3 when ETCD version for main cluster is 2.2.1

6. What did you expect to happen?

I expected that I could also pin calico version like that:

--override 'cluster.spec.networking.calico.majorVersion=v2'

But it fails with:

unhandled field: "cluster.spec.networking.calico.majorVersion=v2"

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

N/A

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

N/A

9. Anything else do we need to know?

This is a major blocker for idempotent kops use cases. Due to etcd-manager being still in beta and no seamless way to upgrade existing clusters from etcd v2 to v3, users being stuck with kops/k8s 1.10 that lacks a lot of important features such as new AWS instance generations, SpotInst support, pod priority classes and much more.

dee-kryvenko commented 5 years ago

cc @tmjd

tmjd commented 5 years ago

Have you tried specifying an empty string for the majorVersion override? Looking at the code I think that should do the trick. If that does not, then I think you should be able to create the cluster configuration without specifying --networking calico and then go edit the configuration and change the networking section to have calico: {}. I know it is not pretty but hopefully that will work for you. If either of those works please consider submitting a PR to add a note to the docs with that info. Thanks.

dee-kryvenko commented 5 years ago

Empty string does not work, already tried that too. The second option I guess would work... we're going to try that out. Any chance we can get it actually fixed? I really have no idea why is it hardcoded a v3 there for create cluster, but I assume something breaks if it don't?

dee-kryvenko commented 5 years ago

I was referring to that https://github.com/kubernetes/kops/blob/master/cmd/kops/create_cluster.go#L1004. Basically if string was empty - it always assumes v3 for create cluster. There's basically no way to set v2 in current implementation. Is there a reason behind it?

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

tmjd commented 5 years ago

What is the problem with using a more recent version of etcd for the cluster? Does kops not support etdv3 with k8s 1.10/1.11? I thought it is possible to use a newer version of etcd but not use the etcd-manager, but I could be wrong. You mention no idempotent way of to create an etcd v2 cluster but can you call kops create cluster .... multiple times on the same cluster? After the cluster is created updates should work since the configuration is already set and the empty calico configuration should work fine to remain on v2.

dee-kryvenko commented 5 years ago

There is no problem using etcd3, the problem is existing etcd2 clusters can't upgrade from 2->3. There was a separate ticket for this and I believe last time I checked, etcd-manager was meant to be a solution for it. Now, the problem with idempotency in kops is exactly there on the surface. Idempotency by the definition is - for the same input constantly and repeatably produce the same output no matter how many time you run it. This is very important for Iac and it's testing and this is broken in kops. I would expect kops to produce the same k8s setup for same input. However, as of now it'll create new clusters using etcd3 but keep etcd2 on existing clusters using the same cluster spec as an input. This is a good example of broken idempotency. Consider full PullRequest-driven change models - people don't run kops from laptops, they have automation. The way I was working around this - I was using kops with the anticipated cluster spec in my pipelines, making kops think it's a new cluster every time (by pointing it to the fake kops empty state). After it produces the cluster config, I save it and apply to the existing cluster using real state this time. Hence the way to explicitly set etcdv2 is required - it'll consider etcdv3 every time since it thinks it's a new cluster, and since the upgrade procedure is broken - it will destroy all the data in etcd.

fejta-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 5 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/kops/issues/6858#issuecomment-538689056): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
max-lobur commented 5 years ago

/reopen

k8s-ci-robot commented 5 years ago

@max-lobur: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes/kops/issues/6858#issuecomment-550317671): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
max-lobur commented 5 years ago

Felt into this when upgraded k8s 1.11 to 1.12 with recent kops. Previously calico version was unset, and it assumed v3 by default :( What a surprise! I use etcd3 but calico upgrade is risky in production clusters so I want to avoid it. I will rather migrate to the new cluster than try in on live. Please please provide a way to lock calico to v2