ministryofjustice / modernisation-platform

A place for the core work of the Modernisation Platform • This repository is defined and managed in Terraform
https://user-guide.modernisation-platform.service.justice.gov.uk
MIT License
677 stars 291 forks source link

Secure Baseline issue #7406

Open markgov opened 3 days ago

markgov commented 3 days ago

User Story

As a Modernisation platform engineer I expect the secure baseline not to create alerts in the low priority channel So that we can have a clean channel

Value / Purpose

After looking at the alerts in the low priority alarms channel https://moj.enterprise.slack.com/archives/C02PFCG8M1R after looking into the alarms we have found that the secure baseline is generating 193 alerts in cloud trail with an error message of

User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/github-actions/githubactionsrolesession is not authorized to perform: organizations:ListAccounts on resource: * because no resource-based policy allows the organizations:ListAccounts action

This needs to be looked into and corrected

Useful Contacts

David Elliot, Richard Green, Mark Roberts

Additional Information

No response

Definition of Done

markgov commented 3 days ago

Rich Green I shouldn't have but , I've gone further down the rabbit hole on the API error we were looking at earlier. I found that when I ran the secure-baselines code locally I was able to trigger the same alert in Cloudwatch that we've been seeing. I looked at debug logs in terraform and got the following: 2024-07-02T12:26:49.627+0100 [DEBUG] provider.terraform-provider-aws_v5.45.0_x5: HTTP Response Received: aws.region=eu-west-2 http.duration=201 http.response.body= | {"__type":"AccessDeniedException","Message":"You don't have permissions to access this resource."} rpc.system=aws-api tf_aws.sdk=aws-sdk-go tf_provider_addr=registry.terraform.io/hashicorp/aws @caller=github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2@v2.0.0-beta.53/logger.go:157 http.response.header.x_amzn_requestid=37314104-42fb-4607-ada2-7f8fd4c4d9e2 @module=aws http.response.header.content_type=application/x-amz-json-1.1 http.response.header.date="Tue, 02 Jul 2024 11:26:49 GMT" http.status_code=400 tf_data_source_type=aws_organizations_organization tf_rpc=ReadDataSource http.response_content_length=98 rpc.method=ListAccounts rpc.service=Organizations tf_aws.signing_region=us-east-1 tf_mux_provider="*schema.GRPCProviderServer" tf_req_id=9c6f6f4a-ca2e-63b8-05a1-e3904f10a9bd timestamp="2024-07-02T12:26:49.627+0100" 12:53 So I think this is the troublemaker... https://github.com/ministryofjustice/modernisation-platform/blob/main/terraform/environments/bootstrap/secure-baselines/locals.tf#L3 I'm trying to work out where it's getting used I think in the config module in the baselines module

dms1981 commented 2 days ago

This looks like a pretty simple fix? One option would be to reference a separate provider such as the sso-readonly role and pass it in to the module?

provider "aws" {
  region = "eu-west-2"
  alias  = "sso-readonly"
  assume_role {
    role_arn = "arn:aws:iam::${local.environment_management.aws_organizations_root_account_id}:role/ModernisationPlatformSSOReadOnly"
  }
}

Alternatively it might be possible to investigate the role/policy being used here and add the missing IAM permission: organizations:ListAccounts

richgreen-moj commented 2 days ago

The strange thing about this one is it looks like a lot of our MP code uses the aws_organizations_organization datasource as it's a useful way to query details about the organisation from a member account.

The github-actions role that is running the baselines already has "Administrator" privileges so adding organizations:ListAccounts doesn't seem necessary.

Weirdly if I run some of the other directories that use this datasource e.g. terraform/environments/bootstrap/member-bootstrap then there are no API errors in cloudtrail 🤔 (but the verbose logs I get locally still show an error like I posted above)

If I run it from terraform/environments/bootstrap/secure-baselines I do get the error, but if i replace that data call with feeding in the root account ID explicitly then it doesn't cause the error.

So.. we could change the way we get that data, like you have suggested @dms1981 - but would it be weird to do it just in one place where it's failing? We use it a lot... I still can't get my head around exactly why it's just causing those API failures when secure-baselines runs 🤔