pulumi / pulumi-aws-native

AWS Native Provider for Pulumi
Apache License 2.0
94 stars 17 forks source link

Aliased step function version gets into delete loophole on definition change #1135

Open Jimmy89 opened 11 months ago

Jimmy89 commented 11 months ago

What happened?

I created a step function with version and alias. When I change the step function definition, a new version is published. The version resource triggers a replacement. However, the the step function alias is still referring to the 'old' version and therefore AWS forbids the delete action of the version resource (and generating an error).

The proper solution would be to detect somehow that an alias is attached to the version and trigger a replacement after updating the alias.

Also: when using a workaround (retainOnDelete on the version) the alias cannot be updated (see error as well). This looks like an upstream error

Example

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws-native";
import * as awsClassic from "@pulumi/aws";

  const stepFunctionLogGroup = new aws.logs.LogGroup(`step-function-log`, {
    logGroupName: pulumi.concat("/aws/vendedlogs/states/",  "testing"),
    retentionInDays: 1,
  }, { replaceOnChanges: ["logGroupName"], deleteBeforeReplace: true });
  const regionName = "eu-central-1"; // Change to your region.

  const allowWritingStepFunctionsLogsPolicy = new awsClassic.iam.Policy(`allow-write-logs`, {
    description: `Allow writing logs for step functions`,
    policy: awsClassic.iam.getPolicyDocumentOutput({
      statements: [
        {
          effect: "Allow",
          actions: [
            "logs:CreateLogDelivery",
            "logs:GetLogDelivery",
            "logs:UpdateLogDelivery",
            "logs:DeleteLogDelivery",
            "logs:ListLogDeliveries",
            "logs:PutResourcePolicy",
            "logs:DescribeResourcePolicies",
            "logs:DescribeLogGroups"
          ],
          resources: ["*"]
        }]
    }).json,
  });
  const sfnRole = new awsClassic.iam.Role(`step-function`, {
    assumeRolePolicy: awsClassic.iam.assumeRolePolicyForPrincipal({ Service: `states.${regionName}.amazonaws.com` }),
    managedPolicyArns: [
      awsClassic.iam.ManagedPolicy.AWSXRayDaemonWriteAccess,
      allowWritingStepFunctionsLogsPolicy.arn
    ]
  });

  const mailStateMachine = new aws.stepfunctions.StateMachine(`step-function`, {
    roleArn: sfnRole.arn,
    stateMachineName: "testing",
    definitionString: JSON.stringify({
  "Comment": "A Hello World example demonstrating various state types of the Amazon States Language. It is composed of flow control states only, so it does not need resources to run.",
  "StartAt": "Pass",
  "States": {
    "Pass": {
      "Comment": "A Pass state passes its input to its output, without performing work. They can also generate static JSON output, or transform JSON input using filters and pass the transformed data to the next state. Pass states are useful when constructing and debugging state machines.",
      "Type": "Pass",
      "Result": {
        "IsHelloWorldExample": true
      },
      "Next": "Hello World example?"
    },
    "Hello World example?": {
      "Comment": "A Choice state adds branching logic to a state machine. Choice rules can implement many different comparison operators, and rules can be combined using And, Or, and Not",
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.IsHelloWorldExample",
          "BooleanEquals": true,
          "Next": "Yes"
        },
        {
          "Variable": "$.IsHelloWorldExample",
          "BooleanEquals": false,
          "Next": "No"
        }
      ],
      "Default": "Yes"
    },
    "Yes": {
      "Type": "Pass",
      "Next": "Wait 3 sec"
    },
    "No": {
      "Type": "Fail",
      "Cause": "Not Hello World"
    },
    "Wait 3 sec": {
      "Comment": "A Wait state delays the state machine from continuing for a specified time.",
      "Type": "Wait",
      "Seconds": 3,
      "Next": "Parallel State"
    },
    "Parallel State": {
      "Comment": "A Parallel state can be used to create parallel branches of execution in your state machine.",
      "Type": "Parallel",
      "Next": "Hello World",
      "Branches": [
        {
          "StartAt": "Hello",
          "States": {
            "Hello": {
              "Type": "Pass",
              "End": true
            }
          }
        },
        {
          "StartAt": "World",
          "States": {
            "World": {
              "Type": "Pass",
              "End": true
            }
          }
        }
      ]
    },
    "Hello World": {
      "Type": "Pass",
      "End": true
    }
  }
}),
    loggingConfiguration: {
      destinations: [{
        cloudWatchLogsLogGroup: {
          logGroupArn: stepFunctionLogGroup.arn
        }
      }],
      level: "ERROR",
      includeExecutionData: false,
    },
    tracingConfiguration: {
      enabled: false,
    },
  });

  const stepFunctionVersion = new aws.stepfunctions.StateMachineVersion("step-function-stable-version", {
    stateMachineArn: mailStateMachine.arn,
    description: "Latest stable version deployed through Pulumi",
    stateMachineRevisionId: mailStateMachine.stateMachineRevisionId
  }, { parent: mailStateMachine, deletedWith: mailStateMachine, dependsOn: [mailStateMachine] });

  new aws.stepfunctions.StateMachineAlias("step-function-stable-alias", {
    description: "Latest stable version deployed through Pulumi",
    name: "stable",
    deploymentPreference: {
      stateMachineVersionArn: stepFunctionVersion.arn,
      type: "ALL_AT_ONCE"
    }
  }, { parent: mailStateMachine, deletedWith: mailStateMachine, dependsOn: [stepFunctionVersion] });
  1. Pulumi up the above code.
  2. Change the definitionString attribute of the step function, like changing the comment.
  3. Get the error.

   ~   └─ aws-native:stepfunctions:StateMachine            step-function                                                       updated (6s)             [diff: ~definitionString]
 +-     └─ aws-native:stepfunctions:StateMachineVersion  step-function-stable-version                                        **replacing failed**     1 error

Diagnostics:
  pulumi:pulumi:Stack (NAME):
    error: update failed

  aws-native:stepfunctions:StateMachineVersion (step-function-stable-version):
    error: operation DELETE failed with "GeneralServiceException": Version to be deleted must not be referenced by an alias. Current list of aliases referencing this version: [stable] (Service: AWSStepFunctions; Status Code: 400; Error Code: ConflictException; Request ID: XXXXX; Proxy: null)
  1. As workaround: I added "retainOnDelete: true" on "step-function-stable-version". However, now I get the error
  aws-native:stepfunctions:StateMachineAlias (step-function-stable-alias):
    error: operation error CloudControl: UpdateResource, https response error StatusCode: 400, RequestID: XXXXX, api error ValidationException: Model validation failed (#: #: only 1 subschema matches out of 2
    #: #: 2 subschemas matched instead of one)

Output of pulumi about

NAME                VERSION
@pulumi/aws         6.5.0
@pulumi/aws-native  0.80.0
@pulumi/pulumi      3.88.1
@pulumi/random      4.14.0

Additional context

If I replace after step 5 the step function alias with aws-classic package I do not have an error (on updating the alias, version is still a problem)

  new awsClassic.sfn.Alias("step-function-stable-alias", {
    description: "Latest stable version deployed through Pulumi",
    name: "stable",
    routingConfigurations: [{
      stateMachineVersionArn: stepFunctionVersion.arn,
      weight: 100,
    }]
  }, { parent: mailStateMachine, deletedWith: mailStateMachine, dependsOn: [stepFunctionVersion] });

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

mikhailshilkov commented 11 months ago

Hi @Jimmy89 Thank you for filing this issue and apologies for a slow response.

I'm trying to reproduce your issue but I get this error while running your program:

CREATE failed with "InvalidRequest": Invalid State Machine Definition: 'SCHEMA_VALIDATION_FAILED: Value cannot be an empty string at /StartAt, SCHEMA_VALIDATION_FAILED: Value cannot be empty at /States, MISSING_TRANSITION_TARGET: Missing 'Next' target: at /StartAt, MISSING_END_STATE: Workflow has no terminal state at null' (Service: AWSStepFunctions; Status Code: 400; Error Code: InvalidDefinition

I'm not super familiar with step functions, do you know how I should fix it?

Jimmy89 commented 10 months ago

@mikhailshilkov Sorry, I didn't see you comment. I will come back this week with an updated example

Jimmy89 commented 10 months ago

@mikhailshilkov I updated the original post with a hello world example from AWS itself.