org-formation / aws-resource-providers

A community driven repository where you can find AWS Resource Type Providers for different purposes (including org-formation ones).
MIT License
85 stars 21 forks source link

bug(Community::Organizations::NoDefaultVPC): Resource NoDefaultVpc failed because Internal Failure #102

Closed dannysteenman closed 2 years ago

dannysteenman commented 2 years ago

We're using this community package: Community::Organizations::NoDefaultVPC and installed it using org-formation as explained in these instructions: https://github.com/org-formation/aws-resource-providers/blob/master/ec2/no-default-vpc/installation.md#installation-using-org-formation-task

Next to that, we've supplied a task to deploy the stack (stackname: compliance-config) using this example: https://github.com/org-formation/aws-resource-providers/blob/master/ec2/no-default-vpc/example.yml

So our tasks look like this:

Parameters:
  <<: !Include "../organization-parameters.yml"

#NoDefaultVPC
CommunityEc2NoDefaultVpcsRP:
  Type: register-type
  SchemaHandlerPackage: s3://community-resource-provider-catalog/community-organizations-nodefaultvpc-0.1.0.zip
  ResourceType: 'Community::Organizations::NoDefaultVPC'
  MaxConcurrentTasks: 10
  OrganizationBinding:
    IncludeMasterAccount: true
    Account: '*'
    Region: !Ref primaryRegion

ComplianceTemplate:
  Type: update-stacks
  Template: ./compliance-template.yml
  StackName: compliance-config
  StackDescription: Remediations for AWS Foundational Security Best Practices
  MaxConcurrentStacks: 10
  FailedStackTolerance: 10
  DefaultOrganizationBindingRegion: !Ref primaryRegion
  OrganizationBinding:
    IncludeMasterAccount: true
    Account: '*'
    Region: !Ref primaryRegion

However, the defaultvpc's are removed the first time. But running the pipeline again having this task enabled causes the following errors:

INFO: Executing: register-type CommunityEc2NoDefaultVpcsRP.
622 | DEBG: Setting build action on register-type / CommunityEc2NoDefaultVpcsRP for 012345678910/eu-west-1 to None - hash matches stored target. (012345678910 = Account1)
DEBG: Stack compliance-config in account 012345678910 (eu-west-1) update starting... (012345678910 = Account1)
645 | ERROR: error updating CloudFormation stack compliance-config in account 012345678910 (eu-west-1).
646 | Resource is not in the state stackCreateComplete (012345678910 = Account1)
647 | ERROR: Resource NoDefaultVpc failed because Internal Failure.
648 | ERROR: Stack compliance-config in account 012345678910 (eu-west-1) update failed. reason: Resource is not in the state stackCreateComplete (012345678910 = Account1)
649 | Resource is not in the state stackCreateComplete (use option --print-stack to print stack)

I would expect org-formation to skip making changes if it detects that there are no default vpc's anymore. Now, it's causing the pipeline to slow down since it retries the tasks before giving up.

As a workaround, I've disabled the task.

OlafConijn commented 2 years ago

hi @dannysteenman, I would expect the same!

A bit more context on how this happens:

if the the task did not change you get the following message "Setting buil action .... None - hash matches stored target". this happens correctly on line 622 of your logs for the CommunityEc2NoDefaultVpcsRP task.

this doesn't seem to happen for ComplianceTemplate (as it clearly executes and fails). 1) this seems to indicate something in your task changed (could be whitespace in the description or the template. 2) now, if something random changed and CloudFormation - after parsing the template - doesn't see a meaningful change, CloudFormation will perform a no-op. 3) if the CloudFormation template changed but this specific resource (the NoDefaultVPC resource) did not change then I would not expect the NoDefaultVPC resource handler to be invoked. CloudFormation would just skip it.

in any case I wouldn't expect the NoDefaultVPC resource to fail. maybe some of the above context helps diagnose the issue? otherwise maybe share the template? I believe your colleague Yannick is on the org-formation slack and feel free to reach out through slack if needed. good luck

dalenewman commented 2 years ago

All my attempts to register the NoDefaultVpc type are failing with the same error.

dannysteenman commented 2 years ago

hi @dannysteenman, I would expect the same!

A bit more context on how this happens:

if the the task did not change you get the following message "Setting buil action .... None - hash matches stored target". this happens correctly on line 622 of your logs for the CommunityEc2NoDefaultVpcsRP task.

this doesn't seem to happen for ComplianceTemplate (as it clearly executes and fails).

  1. this seems to indicate something in your task changed (could be whitespace in the description or the template.
  2. now, if something random changed and CloudFormation - after parsing the template - doesn't see a meaningful change, CloudFormation will perform a no-op.
  3. if the CloudFormation template changed but this specific resource (the NoDefaultVPC resource) did not change then I would not expect the NoDefaultVPC resource handler to be invoked. CloudFormation would just skip it.

in any case I wouldn't expect the NoDefaultVPC resource to fail. maybe some of the above context helps diagnose the issue? otherwise maybe share the template? I believe your colleague Yannick is on the org-formation slack and feel free to reach out through slack if needed. good luck

Thanks for the reply @OlafConijn I totally forgot to respond to this github issue. Good news is that I found the culprit :) and basically what caused the issue in our account was that the custom resource which manages the nodefaultvpc functionality was deleted by our janitor bot (a bot on our AWS account that deletes untagged resources) and every cloudformation stack update caused the error that I've shared earlier. So basically this is bug on our side 😅. Keep up the great work!

dalenewman commented 2 years ago

Update: This has cleared up for me on it's own.