Warn on incompatible aws_security_group use

t0yv0 commented 3 months ago

Hello!

Vote on this issue by adding a 👍 reaction
If you want to implement this feature, comment to let us know (we'll work with you on design, scheduling, etc.)

Issue details

Consider emitting a warning to help users spot and correct incompatible resource pairings such as using a vpc_security_group_egress_rule with an aws_security_group with in-line rules.

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group

You should not use the aws_vpc_security_group_egress_rule and aws_vpc_security_group_ingress_rule resources in conjunction with an aws_security_group resource with in-line rules or with aws_security_group_rule resources defined for the same Security Group, as rule conflicts may occur and rules will be overwritten.

Affected area/feature

t0yv0 commented 3 months ago

Per @EronWright doing a stateful observation of a series of Check calls can help the provider detect this pattern and emit a warning so that the warning is more obvious than just a note in documentation somewhere. This should work out.

EronWright commented 3 months ago

Here's that "stateful observation" conversation, for reference.

corymhall commented 3 months ago

After looking into this a little bit more, it doesn't look like we have a good way to solve this. I originally thought we may be able to solve it by using the PreCheckCallback along with some state, but ran into some issues. Given a simple program like this:

name: sg_inline_rule_warning
runtime: yaml
resources:
  test_sg:
    type: aws:ec2:SecurityGroup
    properties:
      egress:
        - fromPort: 0
          toPort: 0
          protocol: '-1'
          cidrBlocks:
            - 0.0.0.0/0
  test_egress_rule:
    type: aws:vpc:SecurityGroupEgressRule
    properties:
      securityGroupId: ${test_sg.id}
      ipProtocol: '-1'
      cidrIpv4: 0.0.0.0/0

In order to warn the user we would need to be able to resolve the ${test_sg.id} value, which we will not know until an up has been run and the security group id is known. And even after the security group has been created and we know the id, we have no way of mapping that back to the original security group resource since we only have access to the input properties during PreCheckCallback. We would be able to generally know that we have both types of resources, but would not be able to say that a particular security group uses both inline rules and separate rule resources.

Another option with PreCheckCallback would be if the PropertyValues would contain the references to the referenced resources. Currently the value is just an empty computed value, but if it was instead an Output and had the list of Dependencies, then we would probably be able to figure out at preview time that the securityGroupId property has a Dependency on a particular SecurityGroup resource.

An alternative approach is to use policy packs. This would allow us to know about the relationships between resources, but we would have the same limitation of not being able to apply the rule until an up is run. Here is an example policy that we could use:

new PolicyPack("aws-typescript", {
  policies: [
    {
      name: "validate-sg-rules",
      description:
        "Security groups should use either inline rules or separate rules, but not both",
      enforcementLevel: "advisory",
      validateStack: (args, reportViolation) => {
        const securityGroupsWithEgress = args.resources.reduce(
          (prev, curr) => {
            const sg = curr.asType(aws.ec2.SecurityGroup);
            if (sg?.egress) {
              prev[sg.id] = true;
            }
            return prev;
          },
          {} as { [id: string]: boolean },
        );
        const egressRules = args.resources.flatMap((r) => {
          const resource = r.asType(aws.vpc.SecurityGroupEgressRule);
          return resource ?? [];
        });

        for (const rule of egressRules) {
          if (rule.securityGroupId in securityGroupsWithEgress) {
            reportViolation(
              `SecurityGruop ${rule.securityGroupId} is using both the 'egress' property and the 'SecurityGroupEgressRule' resource`,
            );
          }
        }
      },
    },
  ],
});

Even though we won't show the warning until after the security group is created, I still think the warning would be worth it. Ideally we could find a way to embed policies into the provider as a way of emitting these warnings, but the only way I could find to apply these policies are via the pulumi cli. This means that users would need to install a separate library in order to get these warnings.

t0yv0 commented 3 months ago

Thank you for this valuable investigation @corymhall !

I am going to correlate a few comments from internal discussions here as well.

Luke pointed out to https://github.com/pulumi/pulumi-policy-aws aka AWSGuard which currently exists as a "best-practices" checker for AWS and could potentially host these checks, though as you point out it implies downloading an additional dependency and limits the reach somewhat.

Joe offered a historical note that the design of policy packs like the above one also suffers from the loss of information due to unknown values and generally aborts processing when it cannot extract the required information. As a likely consequence of this, implementing this check in AWSGuard would only emit the warning at pulumi up time, not pulumi preview time. While unfortunate this late warning still may have utility to the users.

t0yv0 commented 3 months ago

And even after the security group has been created and we know the id, we have no way of mapping that back to the original security group resource since we only have access to the input properties during PreCheckCallback.

This is unfortunate and indicates extra work needs to happen but is not necessarily impossible. The provider protocol allocates ID in response to Create calls, and the provider could be made to spy on its own Create calls and recall the ID. Additionally ID is passed to DiffRequest. There are however additional limitations of trying to use the provider protocol to infer warnings that make me lean toward your recommendation of using policy packs instead, for instance multiple provider processes may be involved in the same stack and this style of checking cannot span across resources allocated in multiple providers.

pulumi / pulumi-aws