Open nahuaque opened 6 years ago
That sounds like a reasonable assumption. Perhaps if we get in #54 we can pair an SDK update with it.
I know you guy are busy and all, but AFAICT #54 is still stalled, and I think an SDK update is enough to get this working.
@pearkes, is there a contact we could ping to get this revisited? It's been a couple years since the initial report and I know we are interested in seeing the library have better native support for ECS.
Thank you
For instance, I was recreating our retry_join logic for Consul today and ran into an issue where the region
must be specified or the ECS-based tasks wouldn't discover the region correctly. I'm guessing this has something to do with the outdated SDK, or simply insufficient testing in ECS itself.
2020-11-09T21:12:19.098Z [ERROR] agent: Cannot discover address: cluster=LAN address="provider=aws tag_key=consul-servers tag_value=amazing-courser" error="discover-aws: GetInstanceIdentityDocument failed: EC2MetadataRequestError: failed to get EC2 instance identity document
2020-11-09 15:12:19 caused by: RequestError: send request failed
2020-11-09 15:12:19 caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": dial tcp 169.254.169.254:80: connect: invalid argument"
Those following this issue may want to read the recently announced, Consul Service Mesh for Amazon ECS. Interestingly, the discover-servers
component of this new architecture does not use go-discover. In my opinion, it might be possibly related to Extensible discovery for Cloud Auto-Join.
Anyway, those looking to run Consul (perhaps other products later) in ECS have hope. Looks like it's coming in one form or another!
@dekimsey thanks for that link, crossing my fingers here too as a client I work for just asked me today to try to move a 3 server node cluster to Fargate and I didn't believe it could be done yet. I wonder now in that article and if HashiCorp is alluding to whether or not future support of deploying "a production-ready Consul server" would support multi server node setups in this scenario. I'm wondering if HashiCorp would be steering this towards using the recommended 3/5 consul server node setups in ECS or if it would be limited to just 1 consul server node when it runs in this ECS or Fargate hosting context?
I'm seeing this too on the 1.10.2 container image on Fargate. Our consul servers are on EC2 and discovery works great for clients on EC2 using -retry-join "provider=aws tag_key=role tag_value=consul-server"
However, if we try to deploy a client sidecar on Fargate, it does not work. Here's a snippet of the task definition:
"containerDefinitions": [
{
"name": "consul",
"image": "public.ecr.aws/hashicorp/consul:1.10.2",
"essential": true,
"entryPoint": ["/bin/sh", "-ec"],
"command": [
"ECS_IPV4=$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Networks[0].IPv4Addresses[0]')\n exec consul agent -advertise \"$ECS_IPV4\" -datacenter development -retry-join \"provider=aws tag_key=role tag_value=consul-server\" -data-dir /consul/data"
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/consultest",
"awslogs-region": "us-west-2",
"awslogs-stream-prefix": "consul"
}
},
"portMappings": [
{
"containerPort": 8300,
"hostPort": 8300,
"protocol": "tcp"
},
{
"containerPort": 8300,
"hostPort": 8300,
"protocol": "udp"
}
]
},
"placementConstraints": [],
"requiresCompatibilities": [
"FARGATE"
],
In the logs, we see:
[ERROR] agent: Cannot discover address: cluster=LAN address="provider=aws tag_key=role tag_value=development" error="discover-aws: GetInstanceIdentityDocument failed: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": dial tcp 169.254.169.254:80: connect: invalid argument"
According to the ECS task IAM role docs, inside ECS the container IAM role should be fetched from http://169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
Consul agent AWS cloud autojoin is working fine on the ECS container instance, but doesn't work when I start the agent in a task with task networking. Presumably this is because the vendored version of
aws-sdk-go
isn't recent enough to support obtaining credentials via Task Metadata Endpoint version 3, which was only introduced about 14 days ago.