tidwall / tile38

Real-time Geospatial and Geofencing
https://tile38.com
MIT License
9.15k stars 569 forks source link

Enable ChainProvider for SQS to be able to use IAM instance profiles #426

Closed tobilg closed 5 years ago

tobilg commented 5 years ago

It would make sense to incorporate a ChainProvider for the SQS endpoint, to enable the use of IAM instance profiles via EC2RoleProvider.

That'd mean that https://github.com/tidwall/tile38/blob/master/internal/endpoint/sqs.go#L71-L88 would have to be changed.

See the discussion in Slack.

tidwall commented 5 years ago

PR #430 includes this update. When the machine already contains the sqs creds in the env, cred file, or ec2 role, you will no longer need to provide creds in the URL.

tobilg commented 5 years ago

Wow, that was fast! Thank you so much! Will Test it tomorrow and provide feedback asap!

tobilg commented 5 years ago

I just tested the latest edge Docker image. I get a NoCredentialProviders: no valid providers in chain. Deprecated. error when triggering a Geofence event.

I think this is because you're eventually not setting the provider chain in https://github.com/tidwall/tile38/blob/5335aec94254e5b7cdb4e3106c3afb223940e9d7/internal/endpoint/sqs.go#L73-L93.

The AWS SDK docs have an example like

creds := credentials.NewChainCredentials(
    []credentials.Provider{
        &credentials.EnvProvider{},
        &ec2rolecreds.EC2RoleProvider{
            Client: ec2metadata.New(sess),
        },
    })

which to me (as a non-go programmer) seems to be the way to define a provider chain.

tidwall commented 5 years ago

When Credentials are nil (not set) then the session uses the default chain, which includes: Environment, ~/.aws/credentials, and EC2 role.

My tests worked when I tried both env and local file credentials.

tobilg commented 5 years ago

Hm, then I'll need to do a deeper debugging session why this doesn't currently work. Sorry for the confusion... We're running the Docker image in ECS, and the service has definitely the IAM policy to access the SQS queue. I'll give you an update once I have more info.

Thanks again for your support!

tidwall commented 5 years ago

You're welcome and I hope it's something simple.

Also here's the comment from the AWS source regarding the Credentials defaults.

// The credentials object to use when signing requests. Defaults to a
// chain of credential providers to search for credentials in environment
// variables, shared credential file, and EC2 Instance Roles.
Credentials *credentials.Credentials
tobilg commented 5 years ago

Is there a easy way to activate extended logging? I‘m already using the -vv flag

tidwall commented 5 years ago

The -vv flag is the most extended logging that Tile38 can provide.

tobilg commented 5 years ago

I attached to the ECS Task where the Tile38 Docker container is running, installed the AWS CLI and was able to run

# aws sqs send-message --queue-url https://sqs.eu-central-1.amazonaws.com/123456789/my-queue --message-body "test" --region eu-central-1
{
    "MD5OfMessageBody": "098f6bcd4621d373cade4e832627b4f6",
    "MessageId": "548fbc6c-efa8-4dc4-9ffa-c6502fd0f457"
}

So the AWS CLI can get the credentials from the metadata URLs as described in https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html

It's still not working when trying to use SQS as webhook endpoint in Tile38 though, unfortunately. I'm a bit puzzled what could be the reason to be honest.

In the AWS go SDK there's internal/shareddefaults/ecs_container.go, which seems to be used for getting the credentials for the ECS Task Role via the metadata URL. To me, it looks like aws/credentials/endpointcreds/provider.go should be used to get the credentials from the metadata endpoint.

When using curl 169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI I can get valid credentials in the Tile38 container.

Could it be possible that the CredentialsEndpointProvider needs to be set explicitly during credentials.NewChainCredentials()?

tidwall commented 5 years ago

Ok. I found a way to add additional logging for AWS. Try the latest edge release and include the -vv flag. Maybe this will provide some insight.

tobilg commented 5 years ago

Thanks a lot for your help! I used the new edge image and I'm seeing this in the logs:

NoCredentialProviders: no valid providers in chain
caused by: EnvAccessKeyNotFound: failed to find credentials in the environment
SharedCredsLoad: failed to load profile
caused by: EnvAccessKeyNotFound: failed to find credentials in the environment
CredentialsEndpointError: failed to load credentials
caused by: SharedCredsLoad: failed to load profile
CredentialsEndpointError: failed to load credentials

Sending a message via the AWS CLI in the Tile38 container still works. I was suspecting a missing entry in the no_proxy env var for the required 169.254.169.254,169.254.170.2 addresses (we're using a web proxy), but I added that and it still doesn't work...

tobilg commented 5 years ago

Turns out it was a missing entry in the no_proxy settings. I was fooled by just looking at the container env, while Tile38 was spawned within a custom entrypoint, which had it's own no_proxy settings, where 169.254.170.2 was missing... What a PITA!

Now it works as desired. Thanks for your support and patience!

tidwall commented 5 years ago

Oh good! I was starting getting worried :)