org-formation / aws-resource-providers

A community driven repository where you can find AWS Resource Type Providers for different purposes (including org-formation ones).
MIT License
85 stars 21 forks source link

Issues in a new region #97

Open NickDarvey opened 2 years ago

NickDarvey commented 2 years ago

I'm trying to set up our organization to work in af-south-1, a relatively new region, but org-formation times-out and fails while registering NoDefaultVpcRp:

ERROR: Workload NoDefaultVpcRp in 1234/af-south-1 updated failed. reason: Account seems stuck initializing. (1234 = MyOldAccount)
ERROR: Workload NoDefaultVpcRp in 5678/af-south-1 updated failed. reason: Account seems stuck initializing. (5678 = MyOtherAccount)
...

I see that af-south-1 is not a 'known' region, but do any of these community resource providers depend on knowing the regions? It didn't seem like NoDefaultVpcRp does. Do you have any tips for investigating this further?

OlafConijn commented 2 years ago

The cli should use the list of known regions to log a warning (in case someone misspelled a region) - no more. I'll add the region regardless, thanks for pointing this out.

As AWS Services gets bootstrapped when creating a new account, during the bootstrapping process there is a number of errors that get thrown. there is a retry and wait period that should fix this, apparently it did not.

https://github.com/org-formation/org-formation-cli/blob/d55936a472a9f11d3c126e965ed1e4a3204a4e61/src/util/aws-util.ts#L387-L390

What I would be interested in is to see whether there are CloudFormation stacks that have failed or are stuck updating.

I might also be that retrying this once more later on could solve the issue. maybe there was a glitch on the AWS side of things?

thanks

NickDarvey commented 2 years ago

I waited ~18 hours and tried again with the same result, but, that aws-util.ts code gave me a hint: InvalidClientTokenId.

STS tokens from the global endpoint don't work in af-south-1 by default which I can demonstrate with something simple:

~\source\repos\example ≢* +4 ~16  3  796ms
❯ aws sts get-caller-identity --profile Operator-Workspace --region us-east-1
{
    "UserId": "XYZ:nick@example.com",
    "Account": "1234",
    "Arn": "arn:aws:sts::1234:assumed-role/AWSReservedSSO_Operator_1234/nick@example.com"
}

~\source\repos\example ≢* +4 ~16  3  1.618s
❯ aws sts get-caller-identity --profile Operator-Workspace --region af-south-1

An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation: The security token included in the request is invalid

Following the guidance from that AWS doc and enabling all regions for STS tokens meant I could now:

~\source\repos\example ≢* +4 ~16  3  796ms
❯ aws sts get-caller-identity --profile Operator-Workspace --region af-south-1
{
    "UserId": "XYZ:nick@example.com",
    "Account": "1234",
    "Arn": "arn:aws:sts::1234:assumed-role/AWSReservedSSO_Operator_1234/nick@example.com"
}

and deploy my org-formation:

INFO: Workload NoDefaultVpcRp in 1234/af-south-1 updated successful. (1234 = ManagementAccount)

Thanks for the hint @OlafConijn!

NickDarvey commented 2 years ago

After applying the workaround described in https://github.com/org-formation/org-formation-cli/issues/292, I am now running into:

ERROR: Workload NoDefaultVpcRp in 1234/af-south-1 updated failed. reason: User: arn:aws:sts::1234:assumed-role/OrganizationFormationBuildAccessRole/OrganizationFormationBuild is not authorized to perform: cloudformation:UpdateStack on resource: arn:aws:cloudformation:af-south-1:1234:stack/community-organizations-nodefaultvpc-resource-role/* with an explicit deny in a service control policy (1234 = WorkspaceAccount)
User: arn:aws:sts::1234:assumed-role/OrganizationFormationBuildAccessRole/OrganizationFormationBuild is not authorized to perform: cloudformation:UpdateStack on resource: arn:aws:cloudformation:af-south-1:1234:stack/community-organizations-nodefaultvpc-resource-role/* with an explicit deny in a service control policy
AccessDenied: User: arn:aws:sts::1234:assumed-role/OrganizationFormationBuildAccessRole/OrganizationFormationBuild is not authorized to perform: cloudformation:UpdateStack on resource: arn:aws:cloudformation:af-south-1:1234:stack/community-organizations-nodefaultvpc-resource-role/* with an explicit deny in a service control policy
    at Request.extractError (node_modules\aws-sdk@2.949.0\node_modules\aws-sdk\lib\protocol\query.js:50:29)
    at Request.callListeners (node_modules\aws-sdk@2.949.0\node_modules\aws-sdk\lib\sequential_executor.js:106:20)
    at Request.emit (node_modules\aws-sdk@2.949.0\node_modules\aws-sdk\lib\sequential_executor.js:78:10)
    at Request.emit (node_modules\aws-sdk@2.949.0\node_modules\aws-sdk\lib\request.js:688:14)
    at Request.transition (node_modules\aws-sdk@2.949.0\node_modules\aws-sdk\lib\request.js:22:10)
    at AcceptorStateMachine.runTo (node_modules\aws-sdk@2.949.0\node_modules\aws-sdk\lib\state_machine.js:14:12)
    at node_modules\aws-sdk@2.949.0\node_modules\aws-sdk\lib\state_machine.js:26:10
    at Request.<anonymous> (node_modules\aws-sdk@2.949.0\node_modules\aws-sdk\lib\request.js:38:9)
    at Request.<anonymous> (node_modules\aws-sdk@2.949.0\node_modules\aws-sdk\lib\request.js:690:12)
    at Request.callListeners (node_modules\aws-sdk@2.949.0\node_modules\aws-sdk\lib\sequential_executor.js:116:18)

This occurs for every account except for my root/management account.

Looking at the SCPs in the AWS Organization I can see two: DenyLargeEC2Instances and DenyUnsupportedRegions which has the contents:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "ap-southeast-1",
            "ap-southeast-2",
            "us-east-1"
          ]
        }
      },
      "Resource": "*",
      "Effect": "Deny",
      "NotAction": [
        "acm:*",
        "budgets:*",
        "chatbot:*",
        "cloudfront:*",
        "iam:*",
        "sts:*",
        "kms:*",
        "route53:*",
        "route53domains:*",
        "route53resolver:*",
        "organizations:*",
        "support:*",
        "waf:*",
        "wafv2:*"
      ],
      "Sid": "DenyUnsupportedRegions"
    }
  ]
}

Notably it does not deny cloudformation:UpdateStack. (It doesn't contain af-south-1 yet because the SCPs are deployed after the 'types' in org-formation-reference.)

Do you have any tips for diagnosing this?

OlafConijn commented 2 years ago

right ~ I think the order of these tasks need to be changed in the reference. indeed. what you can do to work around this is:

looking forward to hear whether that got you unstuck. I think this is a great gotcha, will make sure it'll get fixed in the reference project.

NickDarvey commented 2 years ago

Success!

INFO: Executing: register-type NoDefaultVpcRp.
INFO: Workload NoDefaultVpcRp in 1234/af-south-1 updated successful. (1234 = Account1)
INFO: Workload NoDefaultVpcRp in 5678/af-south-1 updated successful. (5678 = Account2)
INFO: Workload NoDefaultVpcRp in 1337/af-south-1 updated successful. (1337 = Account3)
...
INFO: Workload NoDefaultVpcRp in 0000/af-south-1 updated successful. (0000 = AccountN)

So I guess one this NoDefaultVpcRp provider is relying on one of the resources described in the DenyUnsupportedRegions SCP?

sshvetsov commented 1 year ago

I've also encountered the error when trying to register the Community::Organizations::NoDefaultVPC resource provider in the non-default ap-southeast-3 region withregister-type task:

ERROR: Workload NoDefaultVpcRpInOptedInRegions in 1234/ap-southeast-3 updated failed. reason: Account seems stuck initializing. (1234 = Account1)

As part of my testing, I've managed to install the resource provider using AWS CLI, so the problem appears to be OFN.

If I understand the cause correctly, it's because OFN is trying to use the global STS endpoint (sts.amazonaws.com) when assuming a role in non-default regions instead of the regional one (sts.ap-southeast-3.amazonaws.com). CMIIW.

Is there a plan to make OFN use regional STS endpoints or should we rely on the workaround of manually setting the version of the global endpoint token to v2Token?