RFC: Using metadataproxy for non-container workloads

amitsaha commented 7 years ago

I am trying to use metadataproxy in a setup where I have multiple services with different IAM policy requirements running on a single box. These services are running on the host and not in containers which is where my "problem" lies.

So, currently, the client's requesting IP is used to look up the container it is associated with and then the environment of the container is examined for the IAM role. I am trying to think of a way to do the same, but for the requesting process. Since each of my processes will have the same IP (127.0.0.1), I cannot use that to uniquely identify the requesting process. I can think of one very crude way to do this. Roughly, this translates to:

from flask import Flask, request
from subprocess import Popen, PIPE

app = Flask(__name__)

LISTENING_PORT = '5000'
@app.route('/')
def index():
    p1 = Popen(['netstat', '-anp'],stdout=PIPE)
    p2 = Popen(['grep', 'ESTABLISHED'], stdin=p1.stdout, stdout=PIPE)
    p3 = Popen(['grep', '-w', LISTENING_PORT], stdin=p2.stdout, stdout=PIPE)
    p4 = Popen(['awk', "{print $7}"], stdin=p3.stdout, stdout=PIPE)
    p3.stdout.close()
    output = p4.communicate()[0]
    for line in output.split('\n'):
        if line:
            # <pid>/<process name>
            # eg: 1111/python
        pid_name = line.split('/')
        if pid_name and pid_name[1] == 'python':
                pid = pid_name[0]
                # now we have the pid of our process
                # that's requesting for credentials
            with open('/proc/%s/environ' % pid_name[0]) as f:
            print f.read().replace('\0', '\n')
    return '', 200
app.run(debug=True)

Upon a request, I use netstat to grep all ESTABLISHED connections on port the metadataproxy service listens on and then use that to get the PID and further examine it's environment to get the role it is requesting credentials for. This assumes that only one process is currently looking to get a certain IAM credentials from metadata proxy, but I can just run only one worker considering the controlled nature of this and my setup.

Does that make any sense? Is there another approach that may be better?

ryan-lane commented 7 years ago

This is a really interesting idea. What's your use-case? From a security perspective this allows any process to request any role they want, but I could see something like this being useful for being able to request arbitrary roles for processes, while having the proxy auto-renew, so it would be useful if you wanted to start the proxy with session credentials that are long-lasting, for roles that are MFA'd.

From the code perspective, it may actually be more efficient to just look through /proc/ to find the socket associated with the request and it wouldn't require shelling out as much.

amitsaha commented 7 years ago

My use-case is similar to that was discussed in the metadataproxy annnouncement blog post: for development, we run multiple services on a single host and hence my idea is that this EC2 instance will have say, Role1 which will be allowed to assume ServiceRole1, ServiceRole2, etc (which will have trust policies to allow Role1 to assume their role).

Then, each process will have in their environment the role they should be assuming, like ServiceRole1 which is where metadataproxy comes in.

I will relook at /proc to see if I can avoid shelling out so many times - an idea I don't like of course. But if you recall what it would be, please let me know.

amitsaha commented 7 years ago

OK, i think this is slightly better reading /proc/net/tcp via psutil:

from flask import Flask, request
from subprocess import Popen, PIPE
import psutil

app = Flask(__name__)

LISTENING_PORT = '5000'
@app.route('/')
def index():
    for connection in psutil.net_connections():
        # We have two connection entries for each ESTABLISHED
        # connection (since client and server are both on localhost)
        # The connection we are interested in
        # has the remote address as 127.0.0.1:5000
        if (connection.raddr and connection.raddr[1] == 5000 
                and connection.status== 'ESTABLISHED'):
            print 'Requesing credentials for %s' % psutil.Process(connection.pid).environ().get('IAM_ROLE')
    return '', 200
app.run(debug=True)

Example client request:

$ IAM_ROLE=MYIAMROLE curl localhost:5000

Will print "MYIAMROLE" on the server side.

amitsaha commented 7 years ago

@ryan-lane would you be keen to look at a PR for metadataproxy?

ryan-lane commented 7 years ago

I think it would be pretty difficult for folks to set this up, so it would require a lot of docs and would add a mode that would be difficult to properly test and maintain, so I think we'd be hesitant to add this in.

amitsaha commented 6 years ago

We ended up doing this instead for my use case instead of trying to hack our way around metadata proxy:

We use boto (boto2)
If the service has a specific config in it's configuration specifying a role, we want to assume, we ask boto to get temp credentials via STS and then use that instead, else use the default IAM role

As far as our AWS IAM configuration goes, we established the policy and trust relationship as per the docs

Do you see any disadvantages to this approach (even if I was running the services in separate containers)?

ryan-lane commented 6 years ago

Yeah, this is a pretty standard way of approaching this. The downsides of this approach is that roles can't auto-renew, depending on what you're doing, but this is the normal method of handling this.

ryan-lane commented 6 years ago

Going to close this issue out :)

amitsaha commented 6 years ago

The downsides of this approach is that roles can't auto-renew, depending on what you're doing, but this is the normal method of handling this.

Yeah, i had to build the retry logic in the application itself which is not ideal.

lyft / metadataproxy

RFC: Using metadataproxy for non-container workloads #55