kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.93k stars 4.65k forks source link

Kops YAML validation with STATE_STORE #2919

Open chrislovecnm opened 7 years ago

chrislovecnm commented 7 years ago

A fun edge case that I ran into on Master.

  1. create a YAML with s3://mystate/mycluster.com

Example:

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2017-04-29T01:57:10Z
  name: us-east-1.aws.k8spro.com
spec:
  api:
    dns: {}
  authorization:
    alwaysAllow: {}
  channel: alpha
  cloudProvider: aws
  configBase: s3://aws.k8spro.com/us-east-1.aws.k8spro.com
  1. set your KOPS_STATE_STORE to s3://aws.k8spro.com/test/us-east-1.aws.k8spro.com
  2. run kops create -f mycluster.yaml create the public ssh key and update the cluster

You will get the clusterspec in s3://aws.k8spro.com/us-east-1.aws.k8spro.com/test/us-east-1.aws.k8spro.com and everything else in s3://aws.k8spro.com/us-east-1.aws.k8spro.com/us-east-1.aws.k8spro.com. Everything should be in what the configBase defines. So the STATE_STORE is overriding the cluster create, but not checking that it is defined correctly in the YAML.

Questions

  1. So what should be the functionality?
  2. Which is the source of truth? KOPS_STATE_STORE or configBase?

Actions

We need to fail on the create when the KOPS_STATE_STORE differs from the configBase.

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle stale

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten /remove-lifecycle stale

fejta-bot commented 6 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

chrislovecnm commented 6 years ago

/lifecycle frozen /remove-lifecycle frozen

chrislovecnm commented 6 years ago

/remove-lifecycle rotten

chrislovecnm commented 6 years ago

/lifecycle frozen

dzoeteman commented 5 years ago

I've run into this too. I would like to pick this up, but then would need to be first clear what the correct course of action is in this case. I think your idea of just failing if the two differ isn't a bad option. Though seeing either KOPS_STATE_STORE or configBase applied consistently would be best I think.

In my mind, KOPS_STATE_STORE should only be used for things like get clusters and setting the default configBase - when configBase is set, it should stick to that (like in create and update). Maybe a warning (not a fail) should still be logged in the output though if they differ.

If instead it's chosen that KOPS_STATE_STORE should be leading, then I'd wonder about the usefulness of configBase.

johngmyers commented 4 years ago

The ConfigBase cannot be where the cluster spec itself is stored because the cluster spec is what contains ConfigBase. The documentation on ConfigBase states that it might be different than the location where the cluster spec itself is stored.

So I believe this issue is invalid.