kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.94k stars 4.65k forks source link

Kops keeps removing SecurityGroupRules from explicitly self-managed security groups #6333

Closed Overbryd closed 4 years ago

Overbryd commented 5 years ago

1. What kops version are you running? The command kops version, will display this information.

$ kops version
Version 1.11.0

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitCommit:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2018-08-20T10:09:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.6", GitCommit:"b1d75deca493a24a2f87eb1efde1a569e52fc8d9", GitTreeState:"clean", BuildDate:"2018-12-16T04:30:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

kops update cluster --name=<redacted> --state=s3://<redacted> --lifecycle-overrides SecurityGroup=ExistsAndWarnIfChanges,SecurityGroupRule=ExistsAndWarnIfChanges

5. What happened after the commands executed?

Kops is removing exiting SecurityGroupRules of the ELB for the Kubernetes API although those security groups are explicitly managed by us (in a separate terraform state).

6. What did you expect to happen?

Kops should leave the SecurityGroupRules as they are. We manage them. Every update to the cluster tampers with the security groups, making the ELB for the Kubernertes API unreachable until we terraform plan & apply our rules again.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2018-04-19T12:09:20Z
  name: <redacted>
spec:
  additionalPolicies:
    node: |
      [
        {"Effect":"Allow","Action":["autoscaling:DescribeAutoScalingGroups","autoscaling:DescribeAutoScalingInstances","autoscaling:DescribeLaunchConfigurations","autoscaling:DescribeTags","autoscaling:SetDesiredCapacity","autoscaling:TerminateInstanceInAutoScalingGroup"],"Resource":"*"},
        {"Effect":"Allow","Action":"s3:*","Resource":"*"}
      ]
  api:
    loadBalancer:
      securityGroupOverride: sg-<redacted>
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://<redacted>/<redacted>
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-<redacted>-1a
      name: a
    - instanceGroup: master-<redacted>-1b
      name: b
    - instanceGroup: master-<redacted>-1c
      name: c
    name: main
  - etcdMembers:
    - instanceGroup: master-<redacted>-1a
      name: a
    - instanceGroup: master-<redacted>-1b
      name: b
    - instanceGroup: master-<redacted>-1c
      name: c
    name: events
  hooks:
  - before:
    - kubelet.service
    manifest: |
      [Unit]
      Description=Download AWS Authenticator configs from S3
      [Service]
      Type=oneshot
      ExecStart=/bin/mkdir -p /srv/kubernetes/aws-iam-authenticator
      ExecStart=/usr/local/bin/aws s3 cp --recursive s3://<redacted>/<redacted>/addons/authenticator /srv/kubernetes/aws-iam-authenticator/
    name: kops-hook-authenticator-config.service
    roles:
    - Master
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    authenticationTokenWebhookConfigFile: /srv/kubernetes/aws-iam-authenticator/kubeconfig.yaml
    runtimeConfig:
      autoscaling/v2beta1: "true"
  kubeControllerManager:
    horizontalPodAutoscalerUseRestClients: true
  kubernetesVersion: 1.11.6
  masterInternalName: api.internal.<redacted>
  masterPublicName: api.<redacted>
  networkCIDR: 172.20.0.0/16
  networking:
    weave:
      mtu: 8912
  nonMasqueradeCIDR: 100.64.0.0/10
  subnets:
  - cidr: 172.20.32.0/19
    name: <redacted>-1a
    type: Private
    zone: <redacted>-1a
  - cidr: 172.20.64.0/19
    name: <redacted>-1b
    type: Private
    zone: <redacted>-1b
  - cidr: 172.20.96.0/19
    name: <redacted>-1c
    type: Private
    zone: <redacted>-1c
  - cidr: 172.20.0.0/22
    name: utility-<redacted>-1a
    type: Utility
    zone: <redacted>-1a
  - cidr: 172.20.4.0/22
    name: utility-<redacted>-1b
    type: Utility
    zone: <redacted>-1b
  - cidr: 172.20.8.0/22
    name: utility-<redacted>-1c
    type: Utility
    zone: <redacted>-1c
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-19T12:09:20Z
  labels:
    kops.k8s.io/cluster: <redacted>
  name: master-<redacted>-1a
spec:
  image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
  machineType: c5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-<redacted>-1a
  role: Master
  securityGroupOverride: sg-<redacted>
  subnets:
  - <redacted>-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-19T12:09:20Z
  labels:
    kops.k8s.io/cluster: <redacted>
  name: master-<redacted>-1b
spec:
  image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
  machineType: c5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-<redacted>-1b
  role: Master
  securityGroupOverride: sg-<redacted>
  subnets:
  - <redacted>-1b

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-19T12:09:20Z
  labels:
    kops.k8s.io/cluster: <redacted>
  name: master-<redacted>-1c
spec:
  image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
  machineType: c5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-<redacted>-1c
  role: Master
  securityGroupOverride: sg-<redacted>
  subnets:
  - <redacted>-1c

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-04-19T12:09:21Z
  labels:
    kops.k8s.io/cluster: <redacted>
  name: nodes
spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/enabled: ""
    kubernetes.io/cluster/<redacted>: ""
  image: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
  machineType: m5.xlarge
  maxSize: 12
  minSize: 4
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  securityGroupOverride: sg-<redacted>
  subnets:
  - <redacted>-1a
  - <redacted>-1b
  - <redacted>-1c

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

I created a redacted log of the above command. https://gist.github.com/Overbryd/a42c1c5995280930fb63477da81243f7

9. Anything else do we need to know?

ZachEddy commented 5 years ago

I ran into this issue, too. It looks like #5744 introduced the securityGroupOverride feature, so looking through that pull request might be a good starting point for a bugfix.

bzuelke commented 5 years ago

I agree with this. Based on the doc here https://github.com/kubernetes/kops/blob/master/docs/security_groups.md it mentions to use the --lifecycle-overrides which will keep kops from touching the SG's or SG rules. kops update cluster ${CLUSTER_NAME} --yes --lifecycle-overrides SecurityGroup=ExistsAndWarnIfChanges,SecurityGroupRule=ExistsAndWarnIfChanges I think if using the override it should account for that and just leave them alone all together outside of the additional syntax for the update.

tomekit commented 5 years ago

Facing this problem as well. I don't want kops to leave default access to API and SSH to be open for everyone: 0.0.0.0/0. When I modify security groups after cluster creation to my value, it's OK until the next kops update is executed, which overrides my value with default: 0.0.0.0/0. However when kops update is run with: --lifecycle-overrides SecurityGroup=ExistsAndWarnIfChanges,SecurityGroupRule=ExistsAndWarnIfChanges those rules gets actually removed. I would expect that kops would simply ignore security groups update, since they're now managed by myself.

Lampino commented 5 years ago

@tomekit Exactly the same problem on my side do you have any idea for a work around regarding this issue?

mgarren commented 5 years ago

Ok I wanted to bump this because it's also blowing away my ingress and egress rules on a securitygroup when I specify the override. I am definitely no go expert, but my suspicion is that the issue is somewhere along the lines of here: https://github.com/kubernetes/kops/blob/a8b0e1b2745b431edae3ba3c105ae221a3b373da/upup/pkg/fi/cloudup/awstasks/securitygroup.go#L85 . If feels like if that lifecycleoverrides flag is set for security groups that should be set to an empty list?

I think I can explain my 443 rules getting removed by a combo of the above and: https://github.com/kubernetes/kops/blob/b2d90fd2c0e466a48ff5ebbfca472564f21b21a7/pkg/model/awsmodel/api_loadbalancer.go#L175

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

bogdanaioanei commented 5 years ago

/remove-lifecycle stale

dobesv commented 4 years ago

By the way, if all you want to do is change the IP addresses with SSH & API access, you can set that using kops edit cluster under kubernetesApiAccess and sshAccess. Set them to some more specific IP address(es).

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

shreben commented 4 years ago

/remove-lifecycle stale

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 4 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/kops/issues/6333#issuecomment-695779252): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
bzuelke commented 4 years ago

/reopen

k8s-ci-robot commented 4 years ago

@bzuelke: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes/kops/issues/6333#issuecomment-696142442): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
bzuelke commented 4 years ago

noo this is something that should be looked at.

rifelpet commented 4 years ago

/reopen

k8s-ci-robot commented 4 years ago

@rifelpet: Reopened this issue.

In response to [this](https://github.com/kubernetes/kops/issues/6333#issuecomment-696209585): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
fejta-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 4 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/kops/issues/6333#issuecomment-713689003): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
tomekit commented 4 years ago

@Lampino I've noticed your question too late. I didn't solve Kubernetes overriding my SG rules, however I did found the configuration option for the both SSH and API access, so it's now actually kops managing these rules:

sshAccess:
  - <ip.ip.ip.ip>/32

kubernetesApiAccess:
  - <ip.ip.ip.ip>/32