rwynn / monstache

a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
https://rwynn.github.io/monstache-site/
MIT License
1.28k stars 181 forks source link

Support for AWS Dynamic Credentials #333

Closed kush-patel-hs closed 4 years ago

kush-patel-hs commented 4 years ago

We're currently doing a SPIKE and some research into using Elasticsearch potentially with Monstache. The tool is working great so far locally. We're trying to figure out how we can deploy it. We have self-hosted Mongo with basic auth, but we have an IAM role for elasticsearch. When we use AWS IAM roles they last for 1 hour and then they refresh. So we can't just launch monstache and let it run. Atm we would have to somehow restart monstache every hour.

aws-sdk-go has support for reading from the ~/.aws/credentials file https://github.com/aws/aws-sdk-go/blob/4f042170d30a74a7b1333268a83154c32347f990/aws/credentials/shared_credentials_provider.go#L29 and also expire these credentials and read the file again when they do expire.

It would be great to have similar support in monstache. Both reading from ~/.aws/credentials and dynamically expiring and reading from it again. We could potentially update the aws-connect configuration to fallback on ~/.aws/credentials if access-key/secret are missing. Or add in a filename and profile.

rwynn commented 4 years ago

Do you have time to submit a PR for monstache for this? I noticed that shared credentials provider you linked does not actually refresh after reading the credentials file though. IsExpired returns false forever after the file is read. Is that how you read that code? It looks like the endpoints provider is more along the lines of something that monstache could use in this case? Specifically, this one seems to be able to use the metadata service to refresh if I'm not mistaken.

kush-patel-hs commented 4 years ago

I might have time to work on this if we come up with a plan of attack. You're right I might have read that provider wrong. Hm I'm not sure if that matches our use case. What you're suggesting would use an AWS Service to refresh the token right?

In our case our service on kubernetes has a sidecar for Vault. And vault assumes an AWS IAM role (which gives creds that last 1 hour), these creds are then written to the ~/.aws/credentials file. Then in an hour Vault will refresh them and write again. So the actual refreshing credentials part is already handled. We just need monstache to be able to keep using the latest variables (i.e if that file changes then use the new creds, not refresh the creds themselves). If that makes sense.

rwynn commented 4 years ago

@kush-patel-hs can you please try the issue-333 branch and let me know if that helps? Switch to that branch and go install the binary. You can configure as follows...

[aws-connect]
# choose a strategy that looks in environment vars and then ~/.aws/credentials.
strategy = 1
# set the profile to use
profile = "dev-profile" 
# force expire the credentials every 5 minutes forcing a re-read
force-expire = "5m"
# add AWS region 
region = "us-east-1"

Strategy 0 is backwards compatible and uses the previous static credentials. Strategy 1 shown above will read the standard AWS environment variables and if empty fall back to the credentials file. Strategy 2 will lookup the credentials via AWS metadata services.

kush-patel-hs commented 4 years ago

Hello @rwynn, sorry for the late reply, I'm on vacation!

Your branch is headed in the right direction!

A few things to iron out.

The duration force expiry (say I set it to 5 min, and our system updates the creds at 6m, then our creds will be wrong for 4m). I could set it to something very low like 15s and I think that would work well for reading env vars, but might be too io intensive for reading file. We could look into using something from here https://github.com/fsnotify to watch the file for changes on top of the force expire. We could also split the env var reading and file reading into two strategies so with env var we can refresh very frequently (10-15s) and with file reading we can watch for changes.

Thanks for the fast turn around! My teammate is continuing our tech evaluation while I'm on vacation.

rwynn commented 4 years ago

@kush-patel-hs updated the branch based on your feedback. Thanks.

rwynn commented 4 years ago

@kush-patel-hs I've merged this code into the rel6 and rel5 branches for evaluation now. You can use a config like this.

[aws-connect]
# choose a strategy that reads from ~/.aws/credentials.
strategy = 1
# set the AWS credential profile to use if not `default`
profile = "dev-profile" 
# set AWS region 
region = "us-east-1"
# enable file watcher
watch-credentials = true
kush-patel-hs commented 4 years ago

Small update: Working for next 4 days then offline for 7 days.

Thanks for adding the file watcher @rwynn! We can switch to using rel6 or rel5 instead of 4.19.3. This should be exactly what we need. Thanks again for working on this!

If you open a PR to merge to master I can review it for you.

kush-patel-hs commented 4 years ago

Quick question, is there a need for CredentialsWatchDir?

I think we can have just CredentialsFile and watch CredentialsFile. Then for the default watch instead of checking the USERPROFILE AND HOME env varaibles and appending .aws we can fallback to reading file from AWS_SHARED_CREDENTIALS_FILE. If that doesn't exist then default to ~/.aws/credentials. (The file not the dir).

Also: We're pulling the rel6 docker image and it's giving us

ERROR 2020/01/20 22:19:05 Config file contains undecoded keys: ["aws-connect.strategy" "aws-connect.profile" "aws-connect.watch-credentials" "aws-connect.credentials-file" "aws-connect.credentials-watch-dir"]

Has it not been updated?

rwynn commented 4 years ago

I initially tried simply putting the watch on the credentials file. But this seemed to have the following problems:

Watching the parent directory (e.g. ~/.aws) did not have these problems. I assume that those using this feature would expect it to work in a wide variety of situations. The only requirement would be having ~/.aws at monstache startup.

The docker file has not been updated as this has not been released yet.

kush-patel-hs commented 4 years ago

A few things:

Furthermore, if we start this feature with just the one variable CredentialsFile which works the way people expect it to as AWS says (AWS_SHARED_CREDENTIALS_FILE with fallback to ~/.aws/credentials). If someone posts an issue saying their credentials file gets deleted for whatever reason then adding CredentialsWatchDir can be a non-breaking change (can use it if it is present in config and watch directory, otherwise just watch the file).

We built your branch! We think the credentials are being fed correctly, but we're having different unknown problems trying to communicate with ES from our staging k8s. We're going to try to figure that out today.

kush-patel-hs commented 4 years ago

Disregard my last comment. Upon consulting with someone who knows more about our vault/k8s setup. I have learned that we do something like mv /tmp/newcreds ~/.aws/credentials as do most tools which counts as replacing the old file (so a delete). So turns out the WatchDirectory was a good call :)

kush-patel-hs commented 4 years ago

Small update: We've got it running on our staging kubernetes! image image

Will report back after an hour to let you know if the credentials refreshed fine.

kush-patel-hs commented 4 years ago

It appears to be working well and went through several refreshes already. Our credentials actually get updated every 20 minutes instead of 1 hour. You can see here the points monstache refreshed them: image And it has continued running fine and not erroring when talking to ES. PODs haven't restarted. All signs are good.

The one thing I noticed is that for our 2 pods we appear to have 6 credential refresh logs. That means when we refresh credential, watch is triggering 3 times for some reason. I guess maybe overwriting a file triggers 3 events while watching the directory (delete old file, new file, ?). I don't think this is a big concern though.

I would say this is ready for next steps! Great work again!

rwynn commented 4 years ago

@kush-patel-hs thanks for the feedback! I would agree that It shouldn't be a major issue if the credentials are invalidated 3 times because the act of invalidation just sets a flag and the actual reloading (in this case reading of the file) happens before the next request is made.

I've pushed out a new release with this feature included. Thanks for taking the time to report and test it out.

kush-patel-hs commented 4 years ago

Thanks for your work on this! @rwynn We'll switchover to using the released image instead of the one we built. We can probably close this issue now!