nginxinc / nginx-s3-gateway

NGINX S3 Caching Gateway
Apache License 2.0
499 stars 126 forks source link

Use Instance profile credential #8

Closed sumitag closed 2 years ago

sumitag commented 3 years ago

Is it possible to use temporary credential retrieved from instance profile/role?

dekobon commented 3 years ago

Would this be the aws_session_token token that is often used with SSO on AWS accounts? Can you provide some more details about what type of temporary credential you are thinking of?

vli425 commented 3 years ago

A very important enhancement request actually. It is considered to be very insecure to save a permanent AWS_ACCESS_KEY and AWS_SECRET_KEY somewhere. It's even forbidden in a lot of enterprise environments. Instead roles should be used to allow AWS components to access services they need.

So let's assume this nginx-s3-gateway would run on a AWS EC2 instance, with the right role applied to the instances it would be allowed to access the S3 bucket with temporary keys which can be retrieved from the EC2 metadata, as described here: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html

That would be a huge security improvement for the nginx-s3-gateway, so I clearly vote up for this, new feature request.

rinrinne commented 3 years ago

"Retrieve security credentials from EC2 instance metadata or ECS task metadata endpoint" ?

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint.html

sumitag commented 2 years ago

"Retrieve security credentials from EC2 instance metadata or ECS task metadata endpoint" ?

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint.html

Depending on how you are running it, if nginx is running on ec2 then instance if ecs then task.

ksandrmatveyev commented 2 years ago

Hi, @dekobon , are there any news or plans about that?

ajschmidt8 commented 2 years ago

This feature would also be hugely beneficial to my team.

dekobon commented 2 years ago

I would like some help from the community in designing this feature. I've got the following questions:

Do the security credentials need to be refreshed at a periodic interval? If so, it will be difficult to implement the feature purely within nginx.

I can imagine a few implementations:

Does anyone have other ideas about how to implement this?

ajschmidt8 commented 2 years ago
  • Maybe, this is a big maybe, a subrequest could be done when a njs variable containing a timestamp is exceeded which then resets to security credentials. This would result in increased latency for the request that would trigger the credentials update.

Appreciate your response! I had looked into implementing the IAM role feature, and this is the path that I had initially considered. Though the other approaches seem interesting as well.

I'll highlight some of the information that I've found in my research so far.

Hopefully this is helpful. I'll continue to explore this on my own as time permits and I'm happy to help do any research that might get this feature implemented quickly!

ajschmidt8 commented 2 years ago

EC2 requires two HTTP requests to retrieve credentials: TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600") && curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/iam/security-credentials/s3access (link)

I don't know if this is accurate. If you click on the IMDSv1 tab from that page, it shows a single HTTP request that looks like it can be used to get the temporary credentials for EC2.

FlorianSW commented 2 years ago

I'm a complete newbie to how this project works :) However, I like it 👍 I just wanted to answer the same information as @ajschmidt8 posted already, so I will just extend on that :P

The expiration of credentials for a role of an ECS task running on the fargate compute type is, by default and currently, 6 hours, iirc. That means, we could hold a copy of these credentials for quite a long time actually. The response from the endpoints mentioned by @ajschmidt8 are also containing the date and time when these credentials will expire. At least for EC2 (I assume it's the same for ECS), credentials are automatically renewed by the endpoints at least 5 minutes before the current ones expire.

While looking at the Compatibility of the njs module, I came to a quick idea of how this could be implemented:

By doing so, nginx fetches the credentials asynchronously to actual requests (except for the first one, but even that could be worked around by adding a refreshCredentials or so method which is called when the server starts?). Preventing any overhead to the webrequests by the credential flow.

What do you think? Would that be possible, or did I miss an important details? :)

FlorianSW commented 2 years ago

EC2 requires two HTTP requests to retrieve credentials: TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600") && curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/iam/security-credentials/s3access (link)

I don't know if this is accurate. If you click on the IMDSv1 tab from that page, it shows a single HTTP request that looks like it can be used to get the temporary credentials for EC2.

While IMDSv1 is still possible to use, one can disable v1 on a per-instance configuration. Also, v2 protects against some possible attacks which were possible with v1 (https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/). Because of that, implementing a new feature, which requires v1, might be difficult to justify. There might even be environments, where the v1 must be disabled and not be used for security or compliance reasons.

I also think that, with an async approach, using v1 doesn't really give a big benefit anymore (compared to on a per-request basis).

FlorianSW commented 2 years ago

I spiked something together :D It's not nearly in a state to be in a PR but maybe it is already enough to get a discussion started about my proposed solution: https://github.com/nginxinc/nginx-s3-gateway/compare/master...FlorianSW:issue/8?expand=1 (I hope you can see this diff)

I'll explore a bit more in the next days or weeks whenever I find a bit of time. However, any feedback is welcome :) (especially int he njs area, as this is completely new to me).

FlorianSW commented 2 years ago

i used my changes I mentioned yesterday and deployed them to ECS. A bit as expected it did not work. After trying to figure out what is going on, I think I now face two fundamental problems with njs (which are by design as far as I know), which, until now, I did not find any solution for (nor a good one):

ajschmidt8 commented 2 years ago

@FlorianSW, just wanted to say that I appreciate the effort here! Hopefully you and @dekobon can come up with something that works.

xeioex commented 2 years ago

Hi @FlorianSW,

njs starts a new VM for the scripting for each request (? at least that's how I understood it so far)

Yes, you are right. Currently, all requests are totally independent by design. The only way to communicate is only through keyval or any other (potential) shared memory module.

the credential fetch process is asynchronous, however, adding the signature to the Authorization header (via js_set) is a synchronous action.

Take a look at the following example.

maybe I just did not found another solution. It does not really need to be a solution to "sync" these values between workers, either. Only between requests would be totally ok.

The quick, but not so elegant solution would be implementing setenv for setting environment variables per process (See process.env ).

last but not least: I didn't find a way with njs to execute code when the config is loaded (something like "on startup of nginx") to initially fetch the credentials

not implemented yet.

Still didn't look into refreshing the credentials (the issue might also be: setTimeout would not have a reference to a request, hence no way to update the credentials variable, even if it would be a keyval (in order to resolve the first issue)). Also: no idea as of now.

One way to do this is to refresh credentials once they are detected to be stale.

FlorianSW commented 2 years ago

njs starts a new VM for the scripting for each request (? at least that's how I understood it so far)

Yes, you are right. Currently, all requests are totally independent by design. The only way to communicate is only through keyval or any other (potential) shared memory module.

Yeah, I kind of expected such an answer :/

Take a look at the following example.

I actually do the example of "generating a JWT". Means:

Based on that it seems I'm doing something wrong 😅

The quick, but not so elegant solution would be implementing setenv for setting environment variables per process (See process.env ).

I thought about that already as well. Basically, we "just" need to save three values (they even do not need to be persistent) somewhere. The environment was something I was thinking about as well, as the values would reside there anyway, if one does not use the instance credentials. If I see it correctly (and reading that into your "would be implementing setenv"), such a functionality does not yet exists, am I right? What would be your thoughts about such an implementation? I mean, one could then change the environment of the nginx process (or at least the worker process) itself, wouldn't it? Any other implementation wouldn't make much sense, and, at least for this use case, wouldn't work either.

Giving this a bit more thought, I wouldn't call it "not so elegant" anymore to be honest. Like I said, our requirements for persistency and even synchronicity between worker processes are really low (basically: no need for any of that), as these credentials can simply be retrieved again when they are not present (anymore).

not implemented yet.

Is there any plan on having something like that? I mean, right now, it would not be a big issue, if that is not present as a feature (given we can work around the other things, the first request might get impacted in performance, however, that should be ok every 6 hours or so).

One way to do this is to refresh credentials ones they are detected to be stale.

But that would be in the scope of an actual request, right? Meaning, if a request is served and the credentials are found to be expired, they would be refreshed before the request can be served, right?

xeioex commented 2 years ago

@FlorianSW

However, the s3auth code (accessing the r.variables.credentials variable) does not hold on that access of the variable and is further processed while the async handler is still fetching the credentials.

Unfortunately, js_set does not support truly async methods (like, setTimeout, ngx.fetch(), r.subrequest()) due to nginx variable interface limitations.

The limitation is that nginx is not able to wait for a variable evaluation asynchronously. (WebCrypto async methods are working right now, because they can be executed synchronously in a JS call.)

But, as I mentioned above, the js_set limitation can be circumvented using auth_request.

Writing the credentials to a temp file (meh... and... meh;

BTW, this is not as bad as it may sound. nginx is reading from FS all the time. In most cases, frequently accessed files are always in a file memory cache, so reading such a file is usually fast. (The FS solution is definitely suboptimal when the file content is changed often and not in atomic fashion, but the credential example is different).

If I see it correctly (and reading that into your "would be implementing setenv"), such a functionality does not yet exists, am I right?

yes, setenv is not implemented.

What would be your thoughts about such an implementation? I mean, one could then change the environment of the nginx process (or at least the worker process) itself, wouldn't it?

Yes, this is will be global (process level) change, visible to other requests. I think this setenv() makes sense for use cases similar to your. I think we may add these feature for the next release.

But that would be in the scope of an actual request, right? Meaning, if a request is served and the credentials are found to be expired, they would be refreshed before the request can be served, right?

I remembered the good (nginx) way to handle expired data (like credentials). Use r.subrequest() or auth_request to a proxy_pass location with enabled cache. When cache is set up correctly, most of the time the data will be served from the cache and as soon as the data expires it will be refetched by proxy_pass automatically.

xeioex commented 2 years ago

@FlorianSW

Unfortunately, js_set does not support truly async methods (like, setTimeout, ngx.fetch(), r.subrequest()) due to nginx variable interface limitations.

just to clarify. As of now, unlike js_set, the truly asynchronous context is js_content.

FlorianSW commented 2 years ago

Alright, I played around a bit in the last days with what inputs @xeioex gave in the recent comments, and it seems I have a working proof of concept, at least :) For the setup:

The current flow looks like this:

  1. Request arrives at the "catch-all" location
  2. nginx fetches up-to-tate credentials
    • auth_request is used to call an internal location
    • the internal location uses js_content to trigger an async njs function that basically fetches credentials by one of these methods (based on env variables)
      • return static credentials (in case of static credentials in the env)
      • call the ECS container relative credentials URL and use the response
      • call the EC2 IMDSv2 to fetch credentials
    • The credentials are not returned in the body, nor in a header, but instead saved in a file in the systems tmp directory
  3. Given that request succeeds the usual "redirect to S3" logic is triggered (like it was so far)
  4. When the request is proxied to S3, the Authorization header is added with the already known s3auth function
    • it reads the credentials file synchronously
    • It uses the values from the credentials file to calculate the signature (which now varies a bit based on if a session token is used or not)
  5. A new function checks if a session token needs to be added to the request
    • again reads the credentials from file
    • returns the value of the session token or an empty string (which should result in the header to be omitted by proxy_pass)
  6. The request is proxied and the response is returned

Caveats:

I'll update the code in my branch in the evening, if I do not forget it 🙈

FlorianSW commented 2 years ago

I just made the last changes I wanted to make, which now also includes refreshing credentials, when the cached ones reach their expiration time. This should make sure, that always fresh credentials are used to access the respective S3 bucket. As the default expiration in ECS is 6 hours, I wasn't able to test out the refresh directly, however, it's the same logic as if no credentials are there. Assuming that the Date object works in njs, and that AWS returns the true expiration (which they do :P), I don't see any reason why this should not work.

As I'm more or less happy with my code, I opened a PR (#18) for all of you to review :)

There is also the open points that needs testing:

After all: Every review, suggestion and feedback is welcome :)

ajschmidt8 commented 2 years ago

This is huge! Thank you for all of your hard work on this @FlorianSW. I will try to find some time this weekend to test this out.

dekobon commented 2 years ago

I'd like to move discussion regarding instance profile support to github issues rather than the PR because there is more to discuss than what can be contained in a single PR.

First off, I've made a number of changes in the branch version.

Next, I see that we still need to do the following before integrating:

dekobon commented 2 years ago

Fixed in 0df77d631ea6873468d9c862187d7d524db0b7b4