hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.61k stars 8.99k forks source link

Resource to manage one statement within a policy, rather than the policy itself. #26306

Open gtmtech opened 1 year ago

gtmtech commented 1 year ago

Community Note

Description

Terraform AWS is geared well to managing complete policies such as IAM policies and S3 bucket policies. You can currently manage both by using the data.aws_iam_policy_document resource (it applies equally well to S3 bucket policies as to IAM policies).

In the case of IAM Policies, you have a further advantage in that AWS allows you to define "inline policies" - as many as you like with a total size limitation across all of them. These inline policies can act as "snippets" with the amalgamation of all of them construing the overall "Policy" (ignoring managed policy attachments for now). That's nice when doing multi-account work - it allows you to either manage the "whole IAM policy" of the role/user in one place, or it allows you to "actively construct it from lots of parts" - which allows you to delegate the act of assuming into a new account to the terraform run provisioning that new account. E.g:

# module for creating new account

# the following done in the new account:

resource "aws_iam_role" "devops" {
    name = "devops"
    provider = aws.newaccount
    ...
}

# the following done in the central account:

data "aws_iam_policy_document" "can-assume-devops" {
    statement {
        sid = "CanAssumeDevopsIn${var.account_id}Account"
        effect = "Allow"
        actions = [ "sts:AssumeRole" ]
        resources = aws_iam_role.devops.arn
    }
}

resource "aws_iam_role_policy" "can-assume-devops" {
    role = "central-devops"
    policy = data.aws_iam_policy_document.can-assume-devops.json
    provider = aws.centralaccount
}    

This is a different paradigm of creating units of work - in this one, if I take these resources and stamp out a new account "dev2", It will not only create devops in dev2, but also make sure by "central-devops" in the central account can assume my new devops - all in one run. Everything is encapsulated in one place so onboarding more accounts is easier and involves single runs.

However S3 Bucket policies don't have quite the same features, you cannot attach more then one S3 policy to an S3 bucket, and therefore at present you have to manage "the whole policy" in one go.

This becomes problematic at scale. Suppose I have a bucket policy that has to reference each account in the org:

{
    "Effect": "Allow",
    "Action": "s3:PutObject",
    "Resources": [
        "arn:aws:s3:::mybucket/AWSLogs/123456789011/*",
        "arn:aws:s3:::mybucket/AWSLogs/123456789012/*",
        "arn:aws:s3:::mybucket/AWSLogs/123456789013/*",
        "arn:aws:s3:::mybucket/AWSLogs/123456789014/*",
        "arn:aws:s3:::mybucket/AWSLogs/123456789015/*",
        ...
    ]
   ...
}

This kind of setup is required for several AWS services - such as cloudtrail, config, flowlogs etc.

Every time I add a new account (which enterprises do quite a lot), I have to go round lots of bucket policies adding that account to the list of approved writers for each of these services, it decouples the management of the bucket to the management of the account, which is problematic because these things are naturally coupled within AWS anyway - you often cant add the policy to the bucket ahead of time (ahead of the thing writing existing). Or in another of these problematic places (assume role policies on IAM objects), you often cant add add another role as a principal unless that role exists. My point is they are naturally coupled, but in terraform you've had to "uncouple them" in different terraform runs, meaning you now have a strict order in your terraform runs.

To me, it would make more sense to allow the same paradigm that iam inline policies allow- that you can manage parts of the full policy, rather than the policy instead. I think for this reason AWS probably invented the "sid" in any s3 policy / iam policy / assumerole policy. The sid allows you to identify particular statements uniquely, and therefore I would hope that terraform could allow you to manage the particular statement as a terraform resource.

Following that through, I would see being able to write something akin to the following:

resource "aws_s3_bucket_policy_statement" {
    bucket = "my-central-bucket"
    statement_id = "AlloWriteFor${var.account_id}Account"
    policy = data.aws_policy_document.this.json
}

data "aws_policy_document" "this" {
    actions = [ "s3:PutObject" ]
    effect = "Allow"
    principals {
        type = "Service"
        identifiers = [ "xxxx.amazonaws.com" ]
    }
    resources = [
        format("arn:aws:s3:::mybucket:AWSLogs/%s/*", var.account_id)
    ]
}

Then - I could have a particular statement in an s3 bucket policy, or in an assume-role policy, or even in an iam inline policy managed as a unique terraform resource, rather than the whole policy. That would allow me to construct the whole policy from several different parts.

The logic would be easy - terraform would just have to download the existing policy, check for a statement ID that matched, replace if it found it, construct a new one if it didn't.

What's not so easy is the locking on the policy - terraform would have to implement a standard locking mechanism so that multiple parallel terraform runs wouldn't overwrite each other between the (a)download and the (b) write phase. It would have to cope with realising it may not be the only thing trying to update the whole policy at once.... But terraform already supports locking with the state file, e.g. with dynamodb, so the same mechanism could be made to work with this.

Anyway it would super useful to us for something like this to exist. Thanks!

New or Affected Resource(s)

gdavison commented 1 year ago

I think that the source_policy_documents and override_policy_documents parameters on aws_iam_policy_document would help in your situation.

If this doesn't address what you're looking for, let us know.

The documentation can be updated to better highlight this functionality.

gtmtech commented 1 year ago

@gdavison I dont believe it fixes the use case.

If you have a terraform run for a particular AWS account - you need that terraform run to do whatever it needs in that account, but then go off to another central bucket account, and update the policy, by managing a particular sid.

You cannot manage the whole policy, as then it will be managed in multiple terraform runs (for each aws account). And a resource should not be entirely managed in more than one place.

E.g. suppose you have a dev, and a prod account. And you have a central bucket which needs to allow both the dev and the prod account to write to it.

Ideally when setting up the "dev" account, I want it to do stuff in the dev account, and then go over to the central bucket and add the dev sid to the bucket policy that allows e.g. the cloudtrail or bucket access logging or vpc flowlog to write to that central bucket.

When I run the same thing on the "prod" account, I would want it to do the same operations on the central bucket but for the prod account.

Different terraform runs, different sides, but the same bucket policy.

jjshoe commented 1 year ago

E.g. suppose you have a dev, and a prod account. And you have a central bucket which needs to allow both the dev and the prod account to write to it.

Though not a terraform based fix, I think this will ease your pain a bit: https://aws.amazon.com/about-aws/whats-new/2018/05/principal-org-id/

You can set a condition on the bucket policy allowing any AWS principal to write, as long as it has the same org id as your org.

gdavison commented 1 year ago

Thanks for the clarification, @gtmtech. If I understand correctly, you'd like to essentially be able to modify a specific SID in a policy document, potentially from different terraform apply runs.

Modifying a specific SID is possible. However, locking with parallel runs of terraform apply raises problems. We currently can lock a given state file so that only one run at a time can access it. If your "prod" and "dev" accounts use separate state files, then they would not be able to lock each other out, potentially corrupting the state.

In the specific situation of adding accounts for your organization, @jjshoe's suggestion might work.

Another idea might be to have the central bucket managed in a third configuration, and either inject the required values using outputs from "prod" and "dev" configurations, or use the terraform_remote_state data source to get the values needed.

gtmtech commented 1 year ago

Thanks for coming back to me -

The principal orgID only works in some use-cases, e.g. good for roles assuming other roles across the org,

But I'm not sure it can be made to work very well where bucket policies need e.g. the typical format /AWSLogs/<ACCOUNTID>/ in their policy which I think are things like flowlogs, cloudtrails (individual accounts), bucket s3 access log forwarders etc.

I'd be interested to see if it can work for all those usecases, but it feels like there are still a bunch of cases where we are being told in aws docs that finegrained account Id style policies need to exist, and if different terraform runs are responsible for different accounts, hmmm....

Regarding the locking problem and - in fact - more worryingly the eventual consistency problem of updating in two different runs almost simultaneously, I think you're right corruption will occur, and there is no way around that.

It sounds like updating individual sids is not the way forward, but at the same time I'm yet to find out what is. I will think some more on your terraform_remote_state suggestion above.

Thanks for all the inputs.

gtmtech commented 1 year ago

(I will also dig deeper into @jjshoe suggestion too)