pulumi / pulumi-eks

A Pulumi component for easily creating and managing an Amazon EKS Cluster
https://www.pulumi.com/registry/packages/eks/
Apache License 2.0
171 stars 81 forks source link

eksClusterIngressRule stuck in delete-recreate cycle #1031

Closed jtmarmon closed 7 months ago

jtmarmon commented 9 months ago

What happened?

We pass a custom clusterSecurityGroup to our cluster.

const clusterSecurityGroup = new aws.ec2.SecurityGroup("cluster-sg", {
  vpcId: vpc.vpcId,
  tags: {
    Name: 'cluster-sg',
  },
  ingress: [....  ],
  egress: [... ],
});
export const cluster = new eks.Cluster("eks-cluster", {
   ...
   clusterSecurityGroup: clusterSecurityGroup,
   ....
});

The following cycle is happening on refresh-update:

aws:ec2:SecurityGroup    cluster-sg          update     [diff: ~ingress]
...
      ~ ingress: [
          - [2]: {
                  - cidrBlocks    : []
                  - description   : "Allow pods to communicate with the cluster API Server"
                  - fromPort      : 443
                  - ipv6CidrBlocks: []
                  - prefixListIds : []
                  - protocol      : "tcp"
                  - securityGroups: [
                  -     [0]: <OUR SG ID>
                    ]
                  - self          : false
                  - toPort        : 443
                }
        ]

And so on...basically the ingress rules of our cluster security group are fighting with the additional ingress rules added by Pulumi. Is there a solution to this?

Example

see above

Output of pulumi about

Version      3.101.1
Go Version   go1.21.6
Go Compiler  gc

Plugins
NAME          VERSION
aws           6.13.2
awsx          2.3.0
docker        4.5.0
docker        3.6.1
eks           2.0.0
kubernetes    4.5.5
mongodbatlas  3.11.2
nodejs        unknown
tls           5.0.0
twingate      0.0.48

Host
OS       darwin
Version  14.1
Arch     arm64

Dependencies:
NAME                            VERSION
@pulumi/awsx                    2.3.0
@pulumi/kubernetes              4.5.5
@pulumi/mongodbatlas            3.11.2
@twingate-labs/pulumi-twingate  0.0.48
ip                              1.1.8
typescript                      4.9.5
zod                             3.22.4
@types/node                     16.18.61
@pulumi/aws                     6.13.2
@pulumi/docker                  4.5.0
@pulumi/eks                     2.0.0
@pulumi/pulumi                  3.92.0
@pulumi/tls                     5.0.0

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

mjeffryes commented 9 months ago

Hi @jtmarmon have you tried adding an ignoreChanges property on the cluster to exclude the rules added by the security group?

jtmarmon commented 9 months ago

@mjeffryes I think the way to do this would be to add ignoreChanges to the ingress input on the SecurityGroup, but that would mean future changes to the ingress rules would be ignored. Is there another way to do it?

mjeffryes commented 8 months ago

@jtmarmon you should be able specifically ignore just the ingress rule that's added by the refresh by specifying the index eg. in your example above, the rule was at index 2 so you'd add ignoreChanges: ["ingress[2]"] (See https://www.pulumi.com/docs/concepts/options/ignorechanges/ for full syntax for ignoring sub-fields, list elements, etc.)

mjeffryes commented 8 months ago

Realized I should elaborate here and say that clearly this is something that should just work and we'd like to fix it in the provider. Just trying to see if we can get you a passable workaround until then!

EronWright commented 7 months ago

The issue here is that SecurityGroup with inline rules is mutually exclusive with SecurityGroupRule. See the Terraform documentation which warns against using inline rules in conjunction with rule resources.

Please update your program accordingly, and sorry for the confusion. Specifically, don't use inline rules in the security group that you pass to eks.Cluster. Meanwhile we'll add a warning for this situation.

Repro:

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

// Create a new security group for allowing SSH access
const clusterSecurityGroup = new aws.ec2.SecurityGroup("clusterSecurityGroup", {
    description: "Allow SSH inbound traffic",
    ingress: [
        {
            protocol: "tcp",
            fromPort: 22,
            toPort: 22,
            cidrBlocks: ["0.0.0.0/0"],
        },
    ],
});

// Create a security group rule for allowing outgoing traffic on all ports
const allTrafficSecurityGroupRule = new aws.ec2.SecurityGroupRule("eksClusterIngressRule", {
    description: "Allow pods to communicate with the cluster API Server",
    type: "ingress",
    fromPort: 443,
    toPort: 443,
    protocol: "tcp",
    cidrBlocks: ["0.0.0.0/0"],
    securityGroupId: clusterSecurityGroup.id,
},);

// Export the Security Group ID
export const securityGroupId = clusterSecurityGroup.id;
EronWright commented 7 months ago

The next version of the pulumi-aws provider will emit a warning when the program uses inline rules. The recommendation is to migrate the inline rules to aws.ec2.SecurityGroupRule resource(s).

Here's a step-by-step guide to migrate the ingress rule(s). Please follow a similar procedure for the egress rule(s).

  1. Remove the ingress block entirely from the aws.ec2.SecurityGroup. Note that removing the block entirely causes the rules to be orphaned, not removed.
  2. For each inline ingress rule, define an equivalent aws.ec2.SecurityGroupRule resource(s).
  3. Import each rule by setting the import option using the naming pattern that is described here.
  4. Run pulumi up.
  5. Remove the import option.

For example, here's the repro program after migration:

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

// Create a new security group for allowing SSH access
const clusterSecurityGroup = new aws.ec2.SecurityGroup("clusterSecurityGroup", {
    description: "Allow SSH inbound traffic",
});

// Create a security group rule for allowing outgoing traffic on all ports
const sshSecurityGroupRule = new aws.ec2.SecurityGroupRule("sshIngressRule", {
    description: "",
    type: "ingress",
    fromPort: 22,
    toPort: 22,
    protocol: "tcp",
    cidrBlocks: ["0.0.0.0/0"],
    securityGroupId: clusterSecurityGroup.id,
}, {import: "sg-0bc2d20bdd94bcc13_ingress_tcp_22_22_0.0.0.0/0"});

const allTrafficSecurityGroupRule = new aws.ec2.SecurityGroupRule("eksClusterIngressRule", {
    description: "Allow pods to communicate with the cluster API Server",
    type: "ingress",
    fromPort: 443,
    toPort: 443,
    protocol: "tcp",
    cidrBlocks: ["0.0.0.0/0"],
    securityGroupId: clusterSecurityGroup.id,
},);

// Export the Security Group ID
export const securityGroupId = clusterSecurityGroup.id;
EronWright commented 7 months ago

Reopening the issue because the first solution - emit a warning - had some pushback in field testing. For example, one of the Pulumi templates triggers the warning: https://github.com/pulumi/templates/blob/98c8a657e2b292343f6dadbbc9fefdc1646fad5d/vm-aws-typescript/index.ts#L57-L74

EronWright commented 7 months ago

Related: https://github.com/pulumi/pulumi-aws/issues/3788

EronWright commented 7 months ago

The SecurityGroup resource with inline rules is mutually exclusive with the SecurityGroupRule resource. Since Cluster uses SecurityGroupRule, the fix for this issue is to not use inline rules for cluster-sg.

I suggest you adjust your program to use non-inline rules. Extract each one of the inline rules from cluster-sg into a standalone rule resource, and use the import option to transfer ownership without disruption, as shown above. Observe the important detail that the SecurityGroup doesn't have an ingress property set.

I should mention that inline egress rules should be similarly refactored.

We've opened a separate ticket for the suggested improvement that Pulumi emit a warning: https://github.com/pulumi/pulumi-aws/issues/3788