hashicorp / terraform-aws-consul

A Terraform Module for how to run Consul on AWS using Terraform and Packer
Apache License 2.0
401 stars 484 forks source link

IAM permissions are incorrect #88

Closed jarrettj closed 5 years ago

jarrettj commented 5 years ago

Hi,

Good day.

Had trouble getting a consul/nomad cluster running. Checked the logs and it could not set a leader. Took me a while to figure out the IAM role permissions might be incorrect.

Been following the guide here https://github.com/hashicorp/terraform-aws-nomad/tree/master/modules/nomad-cluster.

Not sure what the minimal changes are, I granted the IAM role all describe EC2 permissions:

ec2:Describe*

My consul_cluster terraform:

module "consul_cluster" {
  source = "github.com/hashicorp/terraform-aws-consul//modules/consul-cluster?ref=v0.4.0"
...
  user_data = <<-EOF
              #!/bin/bash
              /opt/consul/bin/run-consul --server --cluster-tag-key consul --cluster-tag-value consul-cluster-server
              EOF
}

This appears to happen with the nomad_cluster module as well.

My nomad_cluster terraform:

module "nomad_cluster" {
  source = "github.com/hashicorp/terraform-aws-nomad//modules/nomad-cluster?ref=v0.4.5"

  user_data = <<-EOF
              #!/bin/bash
              /opt/consul/bin/run-consul --client --cluster-tag-key consul --cluster-tag-value consul-cluster-server
              /opt/nomad/bin/run-nomad --client 
              EOF
}

After manually updating the IAM role policy the the cluster is up and running.

Regards. JJ

brikis98 commented 5 years ago

Checked the logs and it could not set a leader.

Could you post the specific error? What version of Consul? Here are the permissions we already provide: https://github.com/hashicorp/terraform-aws-consul/blob/master/modules/consul-iam-policies/main.tf#L16-L18

jarrettj commented 5 years ago

Hi,

Is the version not 0.4.0? If I use the source = "github.com/hashicorp/terraform-aws-consul//modules/consul-cluster?ref=v0.4.0".

I know what the permissions are, I found the IAM role and manually updated it. Problem is I can't update it all the time. Every terraform apply it changes it back.

/opt/consul/log/consul-stdout.log:

==> Log data will now stream in as it occurs:

    2018/10/23 19:11:00 [INFO] raft: Initial configuration (index=0): []
    2018/10/23 19:11:00 [INFO] raft: Node at 10.0.103.185:8300 [Follower] entering Follower state (Leader: "")
    2018/10/23 19:11:00 [INFO] serf: EventMemberJoin: i-07c469f4bedf42279.eu-west-2 10.0.103.185
    2018/10/23 19:11:00 [INFO] serf: EventMemberJoin: i-07c469f4bedf42279 10.0.103.185
    2018/10/23 19:11:00 [INFO] consul: Handled member-join event for server "i-07c469f4bedf42279.eu-west-2" in area "wan"
    2018/10/23 19:11:00 [INFO] consul: Adding LAN server i-07c469f4bedf42279 (Addr: tcp/10.0.103.185:8300) (DC: eu-west-2)
    2018/10/23 19:11:00 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2018/10/23 19:11:00 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2018/10/23 19:11:00 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
    2018/10/23 19:11:00 [INFO] agent: started state syncer
    2018/10/23 19:11:00 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s os packet scaleway softlayer triton vsphere
    2018/10/23 19:11:00 [INFO] agent: Joining LAN cluster...
    2018/10/23 19:11:00 [INFO] discover-aws: Address type  is not supported. Valid values are {private_v4,public_v4,public_v6}. Falling back to 'private_v4'
    2018/10/23 19:11:00 [INFO] discover-aws: Region is eu-west-2
    2018/10/23 19:11:00 [INFO] discover-aws: Filter instances with gasp-consul=consul-cluster-server
    2018/10/23 19:11:00 [INFO] discover-aws: Instance i-07c469f4bedf42279 has private ip 10.0.103.185
    2018/10/23 19:11:00 [INFO] discover-aws: Instance i-03c8af0134d849827 has private ip 10.0.101.85
    2018/10/23 19:11:00 [INFO] discover-aws: Instance i-0f2d9e692e76a41c0 has private ip 10.0.102.210
    2018/10/23 19:11:00 [INFO] agent: Discovered LAN servers: 10.0.103.185 10.0.101.85 10.0.102.210
    2018/10/23 19:11:00 [INFO] agent: (LAN) joining: [10.0.103.185 10.0.101.85 10.0.102.210]
    2018/10/23 19:11:00 [INFO] serf: EventMemberJoin: i-03c8af0134d849827 10.0.101.85
    2018/10/23 19:11:00 [INFO] serf: EventMemberJoin: i-0f2d9e692e76a41c0 10.0.102.210
    2018/10/23 19:11:00 [INFO] consul: Adding LAN server i-03c8af0134d849827 (Addr: tcp/10.0.101.85:8300) (DC: eu-west-2)
    2018/10/23 19:11:00 [INFO] agent: (LAN) joined: 3 Err: <nil>
    2018/10/23 19:11:00 [INFO] agent: Join LAN completed. Synced with 3 initial agents
    2018/10/23 19:11:00 [INFO] consul: Found expected number of peers, attempting bootstrap: 10.0.103.185:8300,10.0.101.85:8300,10.0.102.210:8300
    2018/10/23 19:11:00 [INFO] consul: Adding LAN server i-0f2d9e692e76a41c0 (Addr: tcp/10.0.102.210:8300) (DC: eu-west-2)
    2018/10/23 19:11:00 [INFO] serf: EventMemberJoin: i-03c8af0134d849827.eu-west-2 10.0.101.85
    2018/10/23 19:11:00 [INFO] serf: EventMemberJoin: i-0f2d9e692e76a41c0.eu-west-2 10.0.102.210
    2018/10/23 19:11:00 [INFO] consul: Handled member-join event for server "i-03c8af0134d849827.eu-west-2" in area "wan"
    2018/10/23 19:11:00 [INFO] consul: Handled member-join event for server "i-0f2d9e692e76a41c0.eu-west-2" in area "wan"
    2018/10/23 19:11:07 [ERR] agent: failed to sync remote state: No cluster leader
    2018/10/23 19:11:08 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2018/10/23 19:11:08 [INFO] raft: Node at 10.0.103.185:8300 [Candidate] entering Candidate state in term 2
    2018/10/23 19:11:08 [INFO] raft: Election won. Tally: 2
    2018/10/23 19:11:08 [INFO] raft: Node at 10.0.103.185:8300 [Leader] entering Leader state
    2018/10/23 19:11:08 [INFO] raft: Added peer 9c64f4d7-1834-cd99-bbee-969282e0fbc0, starting replication
    2018/10/23 19:11:08 [INFO] consul: cluster leadership acquired
    2018/10/23 19:11:08 [INFO] raft: Added peer 4492961e-3f15-2176-4554-b0531c52f279, starting replication
    2018/10/23 19:11:08 [INFO] consul: New leader elected: i-07c469f4bedf42279
    2018/10/23 19:11:08 [WARN] raft: AppendEntries to {Voter 9c64f4d7-1834-cd99-bbee-969282e0fbc0 10.0.101.85:8300} rejected, sending older logs (next: 1)
    2018/10/23 19:11:08 [INFO] raft: pipelining replication to peer {Voter 4492961e-3f15-2176-4554-b0531c52f279 10.0.102.210:8300}
    2018/10/23 19:11:08 [INFO] raft: pipelining replication to peer {Voter 9c64f4d7-1834-cd99-bbee-969282e0fbc0 10.0.101.85:8300}
    2018/10/23 19:11:08 [INFO] consul: member 'i-0f2d9e692e76a41c0' joined, marking health alive
    2018/10/23 19:11:08 [INFO] consul: member 'i-07c469f4bedf42279' joined, marking health alive
    2018/10/23 19:11:08 [INFO] consul: member 'i-03c8af0134d849827' joined, marking health alive
    2018/10/23 19:11:08 [INFO] agent: Synced service "_nomad-server-r3nlch3bmntvgj5i43znrzk6icqhsrfg"
    2018/10/23 19:11:08 [INFO] agent: Synced service "_nomad-server-xgfdfg3gj3ma5wcwf5f2cv3vybhxa3gy"
    2018/10/23 19:11:08 [INFO] agent: Synced service "_nomad-server-wkeg3snlf5ditkptobqflr5g6gfwn5jy"
    2018/10/23 19:11:08 [INFO] agent: Synced check "105d010569b225c9f116927a108e4a09c09a2c78"
    2018/10/23 19:11:08 [INFO] agent: Synced check "9e817332b095621953615942adf57a6367a49b07"
    2018/10/23 19:11:08 [INFO] agent: Synced check "711a51e9873e402c77ace492c2ecd79e8918bf62"
    2018/10/23 19:11:09 [INFO] agent: Synced service "_nomad-server-r3nlch3bmntvgj5i43znrzk6icqhsrfg"
    2018/10/23 19:11:09 [INFO] agent: Synced service "_nomad-server-xgfdfg3gj3ma5wcwf5f2cv3vybhxa3gy"
    2018/10/23 19:11:09 [INFO] agent: Synced service "_nomad-server-wkeg3snlf5ditkptobqflr5g6gfwn5jy"
    2018/10/23 19:11:14 [INFO] agent: Synced check "105d010569b225c9f116927a108e4a09c09a2c78"
    2018/10/23 19:11:20 [WARN] agent: Check "711a51e9873e402c77ace492c2ecd79e8918bf62" HTTP request failed: Get http://0.0.0.0:4646/v1/agent/health?type=server: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
    2018/10/23 19:11:35 [WARN] agent: Check "711a51e9873e402c77ace492c2ecd79e8918bf62" HTTP request failed: Get http://0.0.0.0:4646/v1/agent/health?type=server: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
    2018/10/23 19:11:45 [INFO] agent: Synced check "711a51e9873e402c77ace492c2ecd79e8918bf62"
    2018/10/23 19:13:29 [INFO] agent: Synced service "_nomad-server-r3nlch3bmntvgj5i43znrzk6icqhsrfg"
    2018/10/23 19:13:29 [INFO] agent: Synced service "_nomad-server-xgfdfg3gj3ma5wcwf5f2cv3vybhxa3gy"
    2018/10/23 19:13:29 [INFO] agent: Synced service "_nomad-server-wkeg3snlf5ditkptobqflr5g6gfwn5jy"

As soon as I update the IAM role to allow ec2:Describe* everything starts working.

Regards. JJ

jarrettj commented 5 years ago

Hi,

Good day.

I deleted the entire deployment and recreated it. It seems to find the leader now. Thanks.

Regards. JJ