hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.87k stars 9.21k forks source link

[Bug]: CloudWatch Container Insights is enabled even when set to `disabled` in ECS Cluster #36680

Open garysassano opened 8 months ago

garysassano commented 8 months ago

Terraform Core Version

1.7.5

AWS Provider Version

5.43.0

Affected Resource(s)

Expected Behavior

I expected that specifying the ECS Cluster containerInsights setting as disabled would always work.

Actual Behavior

I've discovered that you can bypass the setting and still have the Container Insights working whatever your ECS Cluster setting is.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

Essentially, there two ways you can enable CloudWatch Container Insights:

  1. Set containerinsights to enabled in your aws_ecs_cluster resource.
  2. Use a hack where you bypass the aws_ecs_cluster setting by manually creating the containerinsights resources, so it keeps working even when specifically set to disabled.

Approach 1

import { TerraformStack } from "cdktf";
import { Construct } from "constructs";
import { CloudwatchEventRule } from "../../.gen/providers/aws/cloudwatch-event-rule";
import { CloudwatchEventTarget } from "../../.gen/providers/aws/cloudwatch-event-target";
import { CloudwatchLogGroup } from "../../.gen/providers/aws/cloudwatch-log-group";
import { CloudwatchLogResourcePolicy } from "../../.gen/providers/aws/cloudwatch-log-resource-policy";
import { DataAwsSubnets } from "../../.gen/providers/aws/data-aws-subnets";
import { DataAwsVpc } from "../../.gen/providers/aws/data-aws-vpc";
import { EcsCluster } from "../../.gen/providers/aws/ecs-cluster";
import { EcsService } from "../../.gen/providers/aws/ecs-service";
import { EcsTaskDefinition } from "../../.gen/providers/aws/ecs-task-definition";
import { IamRole } from "../../.gen/providers/aws/iam-role";
import { AwsProvider } from "../../.gen/providers/aws/provider";

export class MyStack extends TerraformStack {
  constructor(scope: Construct, id: string) {
    super(scope, id);

    // Configure AWS provider
    new AwsProvider(this, "AwsProvider");

    /*********************************
     *** EVENTBRIDGE TO CLOUDWATCH ***
     *********************************/

    const ecsErroredTasksEvent = new CloudwatchEventRule(
      this,
      "ECSErroredTasksEvent",
      {
        name: "ecs-errored-tasks-event",
        description: "Triggered when an ECS Task stops because of an error",
        eventPattern: JSON.stringify({
          source: ["aws.ecs"],
          "detail-type": ["ECS Task State Change"],
          detail: {
            desiredStatus: ["STOPPED"],
            lastStatus: ["STOPPED"],
            stoppedReason: [
              {
                wildcard: "*Error:*",
              },
            ],
          },
        }),
      },
    );

    const ecsErroredTasksLog = new CloudwatchLogGroup(
      this,
      "ECSErroredTasksLog",
      {
        name: `/aws/events/ecs-errored-tasks-log`,
        retentionInDays: 7,
      },
    );

    new CloudwatchLogResourcePolicy(this, "ECSErroredTasksLogPolicy", {
      policyName: "ecs-errored-tasks-log-policy",
      policyDocument: JSON.stringify({
        Version: "2012-10-17",
        Statement: [
          {
            Effect: "Allow",
            Principal: {
              Service: ["delivery.logs.amazonaws.com", "events.amazonaws.com"],
            },
            Action: ["logs:CreateLogStream", "logs:PutLogEvents"],
            Resource: `${ecsErroredTasksLog.arn}:*`,
          },
        ],
      }),
    });

    new CloudwatchEventTarget(this, "ECSErroredTasksEventTarget", {
      rule: ecsErroredTasksEvent.name,
      arn: ecsErroredTasksLog.arn,
    });

    /*********************************
     ***** DEFAULT VPC & SUBNETS *****
     *********************************/

    // Fetch region's default VPC
    const defaultVpc = new DataAwsVpc(this, "defaultVpc", {
      default: true,
    });

    // Fetch subnets from region's default VPC
    const defaultVpcSubnets = new DataAwsSubnets(this, "defaultVpcSubnets", {
      filter: [
        {
          name: "vpc-id",
          values: [defaultVpc.id],
        },
      ],
    });

    /**********************************
     ********* AMAZON ECS IAM *********
     **********************************/

    const ecsTaskExecutionRole = new IamRole(this, "ECSTaskExecutionRole", {
      name: "ecs-task-execution-role",
      assumeRolePolicy: JSON.stringify({
        Version: "2012-10-17",
        Statement: [
          {
            Effect: "Allow",
            Principal: { Service: "ecs-tasks.amazonaws.com" },
            Action: "sts:AssumeRole",
          },
        ],
      }),
      inlinePolicy: [
        {
          name: "AmazonECSTaskExecutionRolePolicy",
          policy: JSON.stringify({
            Version: "2012-10-17",
            Statement: [
              {
                Effect: "Allow",
                Action: [
                  "ecr:GetAuthorizationToken",
                  "ecr:BatchCheckLayerAvailability",
                  "ecr:GetDownloadUrlForLayer",
                  "ecr:BatchGetImage",
                  "logs:CreateLogStream",
                  "logs:PutLogEvents",
                  "logs:CreateLogGroup",
                ],
                Resource: "*",
              },
            ],
          }),
        },
      ],
    });

    /**********************************
     *********** AMAZON ECS ***********
     **********************************/

    const ecsCluster = new EcsCluster(this, "EcsCluster", {
      name: "ecs-cluster",
      setting: [
        {
          name: "containerInsights",
          value: "enabled",
        },
      ],
    });

    const ecsTask = new EcsTaskDefinition(this, "ECSTask", {
      family: "ecs-task",
      requiresCompatibilities: ["FARGATE"],
      networkMode: "awsvpc",
      cpu: "256",
      memory: "512",
      runtimePlatform: {
        operatingSystemFamily: "LINUX",
        cpuArchitecture: "X86_64",
      },
      executionRoleArn: ecsTaskExecutionRole.arn,
      containerDefinitions: JSON.stringify([
        {
          name: "unexisting-image",
          image: "unexisting-image",
        },
      ]),
    });

    new EcsService(this, "EcsService", {
      name: "ecs-service",
      cluster: ecsCluster.id,
      taskDefinition: ecsTask.arn,
      launchType: "FARGATE",
      networkConfiguration: {
        subnets: defaultVpcSubnets.ids,
        assignPublicIp: true,
      },
      desiredCount: 1,
    });
  }
}

Approach 2

import { TerraformStack } from "cdktf";
import { Construct } from "constructs";
import { CloudwatchEventRule } from "../../.gen/providers/aws/cloudwatch-event-rule";
import { CloudwatchEventTarget } from "../../.gen/providers/aws/cloudwatch-event-target";
import { CloudwatchLogGroup } from "../../.gen/providers/aws/cloudwatch-log-group";
import { CloudwatchLogResourcePolicy } from "../../.gen/providers/aws/cloudwatch-log-resource-policy";
import { DataAwsSubnets } from "../../.gen/providers/aws/data-aws-subnets";
import { DataAwsVpc } from "../../.gen/providers/aws/data-aws-vpc";
import { EcsCluster } from "../../.gen/providers/aws/ecs-cluster";
import { EcsService } from "../../.gen/providers/aws/ecs-service";
import { EcsTaskDefinition } from "../../.gen/providers/aws/ecs-task-definition";
import { IamRole } from "../../.gen/providers/aws/iam-role";
import { AwsProvider } from "../../.gen/providers/aws/provider";

export class MyStack extends TerraformStack {
  constructor(scope: Construct, id: string) {
    super(scope, id);

    // Configure AWS provider
    new AwsProvider(this, "AwsProvider");

    /*********************************
     *** EVENTBRIDGE TO CLOUDWATCH ***
     *********************************/

    const ecsErroredTasksEvent = new CloudwatchEventRule(
      this,
      "ECSErroredTasksEvent",
      {
        name: "ecs-errored-tasks-event",
        description: "Triggered when an ECS Task stops because of an error",
        eventPattern: JSON.stringify({
          source: ["aws.ecs"],
          "detail-type": ["ECS Task State Change"],
          detail: {
            desiredStatus: ["STOPPED"],
            lastStatus: ["STOPPED"],
            stoppedReason: [
              {
                wildcard: "*Error:*",
              },
            ],
          },
        }),
      },
    );

    const ecsErroredTasksLog = new CloudwatchLogGroup(
      this,
      "ECSErroredTasksLog",
      {
        name: `/aws/events/ecs-errored-tasks-log`,
        retentionInDays: 7,
      },
    );

    new CloudwatchLogResourcePolicy(this, "ECSErroredTasksLogPolicy", {
      policyName: "ecs-errored-tasks-log-policy",
      policyDocument: JSON.stringify({
        Version: "2012-10-17",
        Statement: [
          {
            Effect: "Allow",
            Principal: {
              Service: ["delivery.logs.amazonaws.com", "events.amazonaws.com"],
            },
            Action: ["logs:CreateLogStream", "logs:PutLogEvents"],
            Resource: `${ecsErroredTasksLog.arn}:*`,
          },
        ],
      }),
    });

    new CloudwatchEventTarget(this, "ECSErroredTasksEventTarget", {
      rule: ecsErroredTasksEvent.name,
      arn: ecsErroredTasksLog.arn,
    });

    /*********************************
     ***** DEFAULT VPC & SUBNETS *****
     *********************************/

    // Fetch region's default VPC
    const defaultVpc = new DataAwsVpc(this, "defaultVpc", {
      default: true,
    });

    // Fetch subnets from region's default VPC
    const defaultVpcSubnets = new DataAwsSubnets(this, "defaultVpcSubnets", {
      filter: [
        {
          name: "vpc-id",
          values: [defaultVpc.id],
        },
      ],
    });

    /**********************************
     ********* AMAZON ECS IAM *********
     **********************************/

    const ecsTaskExecutionRole = new IamRole(this, "ECSTaskExecutionRole", {
      name: "ecs-task-execution-role",
      assumeRolePolicy: JSON.stringify({
        Version: "2012-10-17",
        Statement: [
          {
            Effect: "Allow",
            Principal: { Service: "ecs-tasks.amazonaws.com" },
            Action: "sts:AssumeRole",
          },
        ],
      }),
      inlinePolicy: [
        {
          name: "AmazonECSTaskExecutionRolePolicy",
          policy: JSON.stringify({
            Version: "2012-10-17",
            Statement: [
              {
                Effect: "Allow",
                Action: [
                  "ecr:GetAuthorizationToken",
                  "ecr:BatchCheckLayerAvailability",
                  "ecr:GetDownloadUrlForLayer",
                  "ecr:BatchGetImage",
                  "logs:CreateLogStream",
                  "logs:PutLogEvents",
                  "logs:CreateLogGroup",
                ],
                Resource: "*",
              },
            ],
          }),
        },
      ],
    });

    /**********************************
     *********** AMAZON ECS ***********
     **********************************/

    const ecsCluster = new EcsCluster(this, "EcsCluster", {
      name: "ecs-cluster",
      setting: [
        {
          name: "containerInsights",
          value: "disabled",
        },
      ],
    });

    const ecsTask = new EcsTaskDefinition(this, "ECSTask", {
      family: "ecs-task",
      requiresCompatibilities: ["FARGATE"],
      networkMode: "awsvpc",
      cpu: "256",
      memory: "512",
      runtimePlatform: {
        operatingSystemFamily: "LINUX",
        cpuArchitecture: "X86_64",
      },
      executionRoleArn: ecsTaskExecutionRole.arn,
      containerDefinitions: JSON.stringify([
        {
          name: "unexisting-image",
          image: "unexisting-image",
        },
      ]),
    });

    new EcsService(this, "EcsService", {
      name: "ecs-service",
      cluster: ecsCluster.id,
      taskDefinition: ecsTask.arn,
      launchType: "FARGATE",
      networkConfiguration: {
        subnets: defaultVpcSubnets.ids,
        assignPublicIp: true,
      },
      desiredCount: 1,
    });

    /**********************************
     ***** ECS CONTAINER INSIGHTS *****
     **********************************/

    const ecsContainerInsightsLogGroup = new CloudwatchLogGroup(
      this,
      "ECSContainerInsightsLogGroup",
      {
        name: `/aws/events/ecs/containerinsights/${ecsCluster.name}/performance`,
        retentionInDays: 7,
      },
    );

    new CloudwatchLogResourcePolicy(
      this,
      "ECSContainerInsightsLogGroupPolicy",
      {
        policyName: "EventBridgeCloudWatchLogs",
        policyDocument: JSON.stringify({
          Version: "2012-10-17",
          Statement: [
            {
              Sid: "TrustEventBridgeToStoreECSLifecycleLogEvents",
              Effect: "Allow",
              Principal: {
                Service: [
                  "delivery.logs.amazonaws.com",
                  "events.amazonaws.com",
                ],
              },
              Action: ["logs:CreateLogStream", "logs:PutLogEvents"],
              Resource: `${ecsContainerInsightsLogGroup.arn}:*`,
            },
          ],
        }),
      },
    );

    const ecsContainerInsightsEventRule = new CloudwatchEventRule(
      this,
      "ECSContainerInsightsEventRule",
      {
        name: "ecs-container-insights",
        description: `This rule is used to export to CloudWatch Logs the lifecycle events of the ECS Cluster ${ecsCluster.name}.`,
        eventPattern: JSON.stringify({
          source: ["aws.ecs"],
          detail: {
            clusterArn: [ecsCluster.arn],
          },
        }),
      },
    );

    new CloudwatchEventTarget(this, "ECSContainerInsightsEventTarget", {
      rule: ecsContainerInsightsEventRule.name,
      arn: ecsContainerInsightsLogGroup.arn,
    });
  }
}

Steps to Reproduce

Deploy the CDKTF stacks.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 8 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

garysassano commented 8 months ago

Despite being displayed in the CloudWatch Container Insights dashboard for user convenience, the ECS lifecycle events actually stand as an independent feature, not directly part of Container Insights.

What adds to the confusion is that upon activating Container Insights for an ECS cluster, AWS automatically sets up an EventBridge Rule named EventsToLogs-ecs-cl-{randomId}, directing EventBridge events to a CloudWatch log group at /aws/events/ecs/containerinsights/{clusterName}/performance.

The official docs is misleading, since it labels these events as "Container Insights performance log events," which just isn't true. In reality, these are the same ECS lifecycle events that could be independently enabled, as detailed here.

I believe ECS lifecycle events should not be automatically turned on along with Container Insights. Instead, they should have their own toggle and a better name for the CloudWatch log group.

Currently, enabling Container Insights for an ECS Cluster leads to the creation of two distinct CloudWatch log groups:

Deciphering this setup was far from straightforward, proving to be anything but intuitive.

StephenDryden commented 3 months ago

I've just his this exact same confusion today. I'm trying to enable this programatically but struggling as it appears Lifecycle events picks up events from the /aws/events log group but that can only be created when manually clicking "configure lifecycle events" within the container insights panel.

That log group cannot be created programatically as /aws is reserved.