Closed tavisma closed 5 years ago
I made a much simpler cluster without all our custom bits and was able to reproduce this problem again:
aws s3api create-bucket --region "us-west-2" --create-bucket-configuration LocationConstraint="us-west-2" --bucket "<REDACTED>" --acl "private"
aws s3api put-bucket-versioning --region "us-west-2" --bucket "<REDACTED>" --versioning-configuration Status=Enabled
aws s3api put-bucket-encryption --region "us-west-2" --bucket "<REDACTED>" --server-side-encryption-configuration '{ "Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "aws:kms", "KMSMasterKeyID": "<REDACTED>"}}]}'
export NAME=<REDACTED>
export KOPS_STATE_STORE=s3://<REDACTED>
kops create cluster --cloud=aws --cloud-labels='<REDACTED>' --channel=alpha --kubernetes-version=1.10.6 --node-count=1 --zones=${NODE_AZS} --dns-zone=<REDACTED> --node-size=m5.xlarge --master-size=m5.large --master-count=1 --networking=weave --topology=private --authorization=RBAC --associate-public-ip=false --admin-access=${BASTION_TRUSTED_IPS} --ssh-access=${INTERNAL_TRUSTED_IPS} --api-loadbalancer-type=internal --master-volume-size=128 --master-security-groups=${BASTION_SECURITY_GROUP} --node-volume-size=128 --node-security-groups=${BASTION_SECURITY_GROUP} --encrypt-etcd-storage --image=595879546273/CoreOS-stable-1800.4.0-hvm --vpc=$VPC_ID --name=${NAME} --network-cidr=${NETWORK_CIDR} --subnets=${PRIVATE_SUBNETS} --utility-subnets=${PUBLIC_SUBNETS} --dry-run -oyaml > cluster.yaml
kops create -f cluster.yaml
kops update cluster ${NAME} --target=terraform --out=. --yes
terraform apply
After all this, sshing into a master and running systemctl status kops-configuration.service
shows it was not able to download cluster.spec from the S3 bucket containing state due to having no access to the KMS key (adding access to the key manually allows everything to start up properly)
Hi, i had the same issue. Fixed for me by modifying masters policy.
{
"Sid": "kopsK8sKMSEncrypted",
"Effect": "Allow",
"Action": [
"kms:CreateGrant",
"kms:Decrypt",
"kms:DescribeKey",
"kms:Encrypt",
"kms:GenerateDataKey*",
"kms:ReEncrypt*"
],
"Resource": [
"arn:aws:kms:eu-central-1:XXXXXX:key/f75fbbe1-YYY-YYYY-YYYY-ZZZZZZZZ"
]
},
Nodes need the policy update too
For those looking for a quick fix to this issue, using @lukyanetsv's policy in your cluster configuration as follows will work. Ensure that you update the ARN for your KMS key:
spec:
additionalPolicies:
master: |
[
{
"Sid": "kopsK8sKMSEncrypted",
"Effect": "Allow",
"Action": [
"kms:CreateGrant",
"kms:Decrypt",
"kms:DescribeKey",
"kms:Encrypt",
"kms:GenerateDataKey*",
"kms:ReEncrypt*"
],
"Resource": [
"arn:aws:kms:us-east-1:123456789012:key/ee174004-c3b2-4123-9a80-c82f3c70df9d"
]
}
]
node: |
[
{
"Sid": "kopsK8sKMSEncrypted",
"Effect": "Allow",
"Action": [
"kms:CreateGrant",
"kms:Decrypt",
"kms:DescribeKey",
"kms:Encrypt",
"kms:GenerateDataKey*",
"kms:ReEncrypt*"
],
"Resource": [
"arn:aws:kms:us-east-1:123456789012:key/ee174004-c3b2-4123-9a80-c82f3c70df9d"
]
}
]
thanks @lukyanetsv and @waldher -- your fix worked perfectly.
+1 for this issue
Will there be a permanent fix for it?
Hit by this as well
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
Just wanted to make sure this issue is still on the radar. Is there a way to avoid this going forward? The workaround definitely works (thank you @waldher & @lukyanetsv), but seems a bit clunky.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
This issue is still occurring.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
I just hit this issue. It doesn't seem like this should be closed (even though there is a workaround).
It would be great if there was a --kms-key-arn
or similar flag that would create the above workaround in the cluster spec for the user.
We are also encountering this - I will try to submit a PR this week
1. What
kops
version are you running? The commandkops version
, will display this information. Version 1.10.0-beta.1 (git-dc9154528)2. What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag. Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.6", GitCommit:"a21fdbd78dde8f5447f5f6c331f7eb6f80bd684e", GitTreeState:"clean", BuildDate:"2018-07-26T10:17:47Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}3. What cloud provider are you using? aws 4. What commands did you run? What is the simplest way to reproduce this issue?
aws s3api create-bucket \ --region "us-west-2" \ --create-bucket-configuration LocationConstraint="us-west-2" \ --bucket "<REDACTED>" \ --acl "private"
aws s3api put-bucket-versioning \ --region "us-west-2" \ --bucket "<REDACTED>" \ --versioning-configuration Status=Enabled
aws s3api put-bucket-encryption \ --region "us-west-2" \ --bucket "<REDACTED>" \ --server-side-encryption-configuration '{ "Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "aws:kms", "KMSMasterKeyID": "arn:aws:kms:<REDACTED>"}}]}'
5. What happened after the commands executed? kops-configuration.service was unable to access the state files stored in the s3 bucket
` systemctl status kops-configuration.service ● kops-configuration.service - Run kops bootstrap (nodeup) Loaded: loaded (/etc/systemd/system/kops-configuration.service; disabled; vendor preset: disabled) Active: activating (start) since Fri 2018-07-27 00:07:33 UTC; 3min 49s ago Docs: https://github.com/kubernetes/kops Main PID: 881 (nodeup) Tasks: 6 (limit: 32767) Memory: 274.8M CGroup: /system.slice/kops-configuration.service └─881 /var/cache/kubernetes-install/nodeup --conf=/var/cache/kubernetes-install/kube_env.yaml --v=8
Jul 27 00:11:12 ip-10-65-129-161.ec2.internal nodeup[881]: I0727 00:11:12.540408 881 assetstore.go:313] added asset "ptp" for &{"/var/cache/nodeup/extracted/sha1:REDACTEDhtt> Jul 27 00:11:12 ip-10-65-129-161.ec2.internal nodeup[881]: I0727 00:11:12.540429 881 assetstore.go:313] added asset "sample" for &{"/var/cache/nodeup/extracted/sha1:REDACTED> Jul 27 00:11:12 ip-10-65-129-161.ec2.internal nodeup[881]: I0727 00:11:12.540448 881 assetstore.go:313] added asset "tuning" for &{"/var/cache/nodeup/extracted/sha1:REDACTED_> Jul 27 00:11:12 ip-10-65-129-161.ec2.internal nodeup[881]: I0727 00:11:12.540468 881 assetstore.go:313] added asset "vlan" for &{"/var/cache/nodeup/extracted/sha1:REDACTED_ht> Jul 27 00:11:12 ip-10-65-129-161.ec2.internal nodeup[881]: I0727 00:11:12.541551 881 files.go:100] Hash matched for "/var/cache/nodeup/sha1:REDACTED_https___kubeupv2_s3_amazo> Jul 27 00:11:12 ip-10-65-129-161.ec2.internal nodeup[881]: I0727 00:11:12.541573 881 assetstore.go:203] added asset "utils.tar.gz" for &{"/var/cache/nodeup/sha1:REDACTED> Jul 27 00:11:12 ip-10-65-129-161.ec2.internal nodeup[881]: I0727 00:11:12.541663 881 assetstore.go:313] added asset "socat" for &{"/var/cache/nodeup/extracted/sha1:REDACTED> Jul 27 00:11:12 ip-10-65-129-161.ec2.internal nodeup[881]: I0727 00:11:12.541694 881 s3fs.go:216] Reading file "s3:///cluster.spec"
Jul 27 00:11:12 ip-10-65-129-161.ec2.internal nodeup[881]: W0727 00:11:12.961693 881 main.go:142] got error running nodeup (will retry in 30s): error loading Cluster "
Jul 27 00:11:12 ip-10-65-129-161.ec2.internal nodeup[881]: status code: 403, request id:
`
Manually providing the IAM roles created by kops access to the KMS key used to encrypt the S3 bucket allows the kops-configuration.service to start and the cluster to boot
6. What did you expect to happen? It seems that when encryption is used in the S3 bucket used for KOPS_STATE_STORE, the nodes are not given access to the encryption key used in the bucket I didn't encounter any problem with kops-1.9.1 Cluster was created using '--target=terraform'
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.9. Anything else do we need to know?