kubernetes-sigs / cluster-api-provider-aws

Kubernetes Cluster API Provider AWS provides consistent deployment and day 2 operations of "self-managed" and EKS Kubernetes clusters on AWS.
http://cluster-api-aws.sigs.k8s.io/
Apache License 2.0
641 stars 566 forks source link

SecretsManager actions in node IAM policy vs. CloudFormation template #4339

Closed carstenkoester closed 7 months ago

carstenkoester commented 1 year ago

/kind bug

What steps did you take and what happened:

In essence, attempted to bootstrap a workload cluster while not using clusteraws bootstrap iam create-cloudformation-stack, but by taking the output of clusterawsadm bootstrap iam print-policy --document [...] (which appear to be identical to the policies published as artifacts eg. here and creating the IAM policies, roles, and instance profiles through other automation.

When spinning up a workload cluster, the workload cluster was unable to fetch cloud-init userdata because no policy in the instance role permitted a GetSecretValue operation.

Troubleshooting this, I compared the CloudFormation stack (output of clusterawsadm bootstrap iam print-cloudformation-template) with the output of clusterawsadm bootstrap iam print-policy --document AWSIAMManagedPolicyCloudProviderNodes, and noticed that the CloudFormation stack does grant the GetSecretValue permission (and others) while the output of clusterawsadm bootstrap iam print-policy (or the previously linked artifact) do not.

What did you expect to happen:

I expected to be able to deploy a workload cluster using the IAM policies returned by clusterawsadm bootstrap iam print-policy --document [...].

Note: I'm not looking for help troubleshooting the actual userdata issue that occurred; this is already narrowed down to the missing IAM policy statement and my issue is resolved since using the CloudFormation template as a reference and updating the policy. This bug report is merely to document that there is a difference between the AWSIAMManagedPolicyCloudProviderNodes in the CF stack and JSON artifact, and to clarify whether this is by design, or whether the JSON artifact should be updated.

Anything else you would like to add:

Policy as returned by clusterawsadm bootstrap iam print-policy --document AWSIAMManagedPolicyCloudProviderNodes:

$ clusterawsadm bootstrap iam print-policy --document AWSIAMManagedPolicyCloudProviderNodes
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:AssignIpv6Addresses",
        "ec2:DescribeInstances",
        "ec2:DescribeRegions",
        "ec2:CreateTags",
        "ec2:DescribeTags",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribeInstanceTypes",
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:GetRepositoryPolicy",
        "ecr:DescribeRepositories",
        "ecr:ListImages",
        "ecr:BatchGetImage"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

Policy as embedded in the CF template:

$ clusterawsadm bootstrap iam print-cloudformation-template \
    | yq -e '.Resources.AWSIAMManagedPolicyCloudProviderNodes.Properties.PolicyDocument' -o json
{
  "Statement": [
    {
      "Action": [
        "ec2:AssignIpv6Addresses",
        "ec2:DescribeInstances",
        "ec2:DescribeRegions",
        "ec2:CreateTags",
        "ec2:DescribeTags",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribeInstanceTypes",
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:GetRepositoryPolicy",
        "ecr:DescribeRepositories",
        "ecr:ListImages",
        "ecr:BatchGetImage"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ]
    },
    {
      "Action": [
        "secretsmanager:DeleteSecret",
        "secretsmanager:GetSecretValue"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:*:secretsmanager:*:*:secret:aws.cluster.x-k8s.io/*"
      ]
    },
    {
      "Action": [
        "ssm:UpdateInstanceInformation",
        "ssmmessages:CreateControlChannel",
        "ssmmessages:CreateDataChannel",
        "ssmmessages:OpenControlChannel",
        "ssmmessages:OpenDataChannel",
        "s3:GetEncryptionConfiguration"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ]
    }
  ],
  "Version": "2012-10-17T00:00:00Z"
}

AWSIAMManagedPolicyCloudProviderControlPlane and AWSIAMManagedPolicyControllers look the same in both places. I did not compare or use other policies.

Environment:

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 7 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/4339#issuecomment-2015460836): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.