hashicorp / terraform-aws-consul

A Terraform Module for how to run Consul on AWS using Terraform and Packer
Apache License 2.0
401 stars 487 forks source link

Should security group rules implicitly allow inbound from self? #46

Open jgiles opened 6 years ago

jgiles commented 6 years ago

This follows up on #13.

The pattern from the Vault cluster module which that PR emulates adds an implicit "inbound from self" block, which allows for Vault request forwarding within the Vault security group:

https://github.com/hashicorp/terraform-aws-vault/blob/8d093d05c0a0d6c8861eea79c054cb7274628dc6/modules/vault-security-group-rules/main.tf#L27

I believe something similar is necessary for the security group rules in this module, in order to make sure the Consul instances can talk to each other.

If I try to allow inbound traffic just from our intended clients (a Vault cluster), the Consul cluster never selects a leader (probably because the instances cannot talk to each other).

A workaround fix is to add another instance of the security rules module explicitly:

module "consul_cluster" {
  source = "github.com/hashicorp/terraform-aws-consul//modules/consul-cluster?ref=v0.1.0"
  allowed_inbound_security_group_ids = ["${module.vault_cluster.security_group_id}"]
  allowed_inbound_cidr_blocks = []
  <snip>
}

module "consul_self_access_sg_rules" {
  source = "github.com/hashicorp/terraform-aws-consul//modules/consul-security-group-rules?ref=v0.1.0"
  security_group_id = "${module.consul_cluster.security_group_id}"
  allowed_inbound_cidr_blocks = []
  allowed_inbound_security_group_ids = ["${module.consul_cluster.security_group_id}"]
}

It's likely that not all the different inbound security rules need to be applied for intra-cluster communication - just the protocols actually used for chatter between instances.

brikis98 commented 6 years ago

Ah, that's a good point. Our cluster works without this because we typically configure allowed_inbound_cidr_blocks to allow inbound connections from all IPs within the same set of subnets, but that won't work for everyone.

Probably the best solution is to add another aws_security_group_rule to the consul-security-group-rules module that sets the self param to true for the gossip port. A PR for that would be very welcome :)

jgiles commented 6 years ago

I'm happy to assemble a PR :-)

Can you advise on which ports + protocols are used for inter-server gossip, which are used by client agents/other clients, and which are used for both? I'm not super familiar with Consul internals, and there are a lot of different ports/protocols configured.

As an aside, I'm having difficulty using the security groups feature in practice because of a "value of 'count' cannot be computed" error on the security group ID list (I think https://github.com/hashicorp/terraform/issues/10857 / https://github.com/hashicorp/terraform/issues/12570 is to blame).

Since security group IDs seem very likely to be resolved from another resource, having them passed as a list and subject to the count issue is unfortunate.

Would it make sense to expose a dedicated module for adding a single security group as an allowed client? That would avoid the count problem and allow adding a dynamic security group.

brikis98 commented 6 years ago

I'm happy to assemble a PR :-)

Wonderful, thanks!

Can you advise on which ports + protocols are used for inter-server gossip, which are used by client agents/other clients, and which are used for both? I'm not super familiar with Consul internals, and there are a lot of different ports/protocols configured.

I'm not 100% sure, but based on the architecture diagram, almost all the ports may be used, except perhaps the HTTP and DNS ones, depending on the use case. That said, it seems reasonably safe to allow the cluster to talk to itself across all ports, so adding a self = true security_group_rule for each port should be fine.

As an aside, I'm having difficulty using the security groups feature in practice because of a "value of 'count' cannot be computed" error on the security group ID list (I think hashicorp/terraform#10857 / hashicorp/terraform#12570 is to blame).

Yup, this is a known and unfortunate limitation of Terraform. count can be computed from static values, variables, and data sources, but not the outputs of other resources.

Until it's fixed, the most common scenario is to pass in a num_xxx param (e.g., num_security_group_ids) and use that in the count parameter. This means the code isn't DRY and is quite error prone, but there aren't too many better options right now.

I'm not sure this is a concern for this particular bug though?

Would it make sense to expose a dedicated module for adding a single security group as an allowed client? That would avoid the count problem and allow adding a dynamic security group.

Not sure I follow. Could you give an example?