nginxinc / nginx-openid-connect

Reference implementation of OpenID Connect integration for NGINX Plus
https://www.nginx.com/products/nginx/
Other
197 stars 94 forks source link

Block the sensitive session and nonce cookies from passing through to the upstream. #91

Closed ag-TJNII closed 5 months ago

ag-TJNII commented 6 months ago

Err on being secure: Block the auth_ cookies from the upstream, as it's more likely that they will not be needed than will be needed. This prevents accidental session exposure though misconfigured upstreams.

route443 commented 6 months ago

Hi @ag-TJNII, Thank you for your contribution to the nginx-openid-connect project. I appreciate the effort you've put into enhancing the security of the solution. However, I'm unable to accept this change for several reasons:

  1. The approach you've proposed does not work in the default configuration (as far as I can tell) and may confuse inexperienced users as to why this is the case. Simply put, this method does not address the stated objective, despite your declaration.

  2. "If is evil" and we generally advised against using if in production environments due to its unpredictable behavior in nginx's context processing. While it is acceptable for a poc, you are positioning it as a solution for all the use cases.

  3. The essence of this implementation leans more towards using NJS, where the conf files reflects some minimal approach necessary to make it operational in most scenarios and to demonstrate one of the possible methods. Each time you implement and use njs-based OIDC solution, it's up to you to decide exactly what needs to be dome in terms of configuration to meet your security requirements.

  4. The concept of removing auth_ cookies by default from requests to upstream is not common practice (AFAIK) and usually depends on specific security requirements of the deployment. It's a decision that must be tailored to each particular use case. I think it would be more appropriate to demonstrate this approach in a dedicated security hardening blog post rather than include it directly in the codebase.

Thanks/

ag-TJNII commented 5 months ago

The approach you've proposed does not work in the default configuration (as far as I can tell) and may confuse inexperienced users as to why this is the case. Simply put, this method does not address the stated objective, despite your declaration.

How so? It prevents the session cookie from being sent to the upstream server. How does this not meet the stated objective?

"If is evil"

It's "If is evil, when used in a location context" isn't it? This is a server context, but more importantly I don't think there is any other way to do this. I tried to find the "If is evil" documentation but it looks like https://docs.nginx.com/ was redone and all the links are now dead.

The essence of this implementation leans more towards using NJS

But njs will be rejected for performance reasons so that's not relevant here.

The concept of removing auth_ cookies by default from requests to upstream is not common practice (AFAIK) and usually depends on specific security requirements of the deployment. It's a decision that must be tailored to each particular use case.

In this case the auth_ cookie is sensitive, as if compromised it opens up session hijacking attacks. The current implementation is insecure by default, a sensitive token is passed to an app that's expecting to offload auth onto this Nginx solution. As the upstream app is not handling auth the upstream developers may do things like log all the headers or take other insecure actions, because their app is not supposed to be handling the security sensitive parts of the flow. I argue the upstream app needing the sensitive session cookie is the less likely case, as the upstream can always add their own cookies as needed that are not sensitive.

We should be secure by default and make it easy to do things right, not default to insecure and expect admins to figure out the unfinished parts.

route443 commented 5 months ago

How so? It prevents the session cookie from being sent to the upstream server. How does this not meet the stated objective?

As stated in the proxy_set_header documentation: "These directives are inherited from the previous configuration level if and only if there are no proxy_set_header directives defined on the current level." Please conduct a test with the root location in the default configuration and check if the cookies are stripped before being sent to the upstream.

It's "If is evil, when used in a location context" isn't it?

While it’s true that the primary concern is with if in a location context, consider whether keeping it in the server block will always be a necessary and sufficient condition for reliable operation. There might be scenarios where this assumption does not hold.

The essence of this implementation leans more towards using NJS

This comment was not a call to action to move your approach to NJS. It was meant to convey that the configuration in the files serves as an example. In each specific case, you should apply the configuration that meets your requirements.

The current implementation is insecure by default

Your example assumes that developers might not understand how this configuration works, potentially leading to misconfiguration of the upstream application, or that the upstream is not secure. I operate under the assumption that developers know what they are doing and that the upstream is secure. Developers always have the necessary tools to change this behavior if needed.

We should be secure by default and make it easy to do things right, not default to insecure and expect admins to figure out the unfinished parts.

Can I ask how you use this solution in your environment? Do you test it before deploying it in a production environment? Do you copy the configuration file structure "as is" or do you use them as a example of a certain approach?

Once again, I want to emphasize that this configuration does not aim to follow a "one-size-fits-all approach". NGINX is a complex tool with its own idiosyncrasies, and I'm not sure if it's possible to create a configuration that satisfies all best security practices without any drawbacks...

ag-TJNII commented 5 months ago

Can I ask how you use this solution in your environment?

So I think the best answer to the spirit of this question is to discuss the solution I inherited, not the solution I'm migrating to. This solution was built rapidly by a SME, and of course went into a semi-production state almost immediately. It stored all the Nginx logs in a open and centrally accessible log store (Motivation for #89), had no tests, and used the "as-is" configuration file structure. The upstream app logged all the headers into the same central log store, because the developers of that app didn't realize they were getting a sensitive cookie and wanted to see the full request for debugging (Motivation for this PR).

In my professional experience the kind of deployment I inherited is very common. Someone sets it up, going only deep enough to get it working and then they get pulled off to the next priority. It gets pushed out the door too soon. The people who have the technical where-with-all to do a deep dive are too busy, so everyone assumes it's fine and nobody looks behind the curtains. If you look behind the curtain and find something then you need to clean it up, so don't go looking as you don't have time. It becomes a black box nobody fully understands.

I inherited this solution expecting that the OIDC implementation would be secure with minimal updates. The only reason I've found the issues I've been reporting was because I wanted to make a secure, maintainable, and internally reusable solution, so I took the time to write a test suite. Writing a test suite on top of Nginx is HARD. It can't be unit tested like regular language tools, I had to create lots of fixtures, build fixture servers, create a pattern to start/stop the test Nginx instance, and walk through the OIDC flow step by step to understand what to test. It's only through this deep dive that I started finding the deep dust in the curtains. I personally am an outlier, if I didn't feel that understanding the OIDC flow was a useful personal development item I would have stopped before finding these issues. I'd bet folding money the majority of users of this implementation don't go anywhere near this deep, don't understand the implications of the session token, and don't know the nuance of OIDC flow. Why would they? This implementation is supposed to have already solved these problems, which is the selling point. I originally didn't want to have to customize and tune this to the level I have, I wanted to deploy this with as few changes as possible to allow future updates from this repo with minimal effort. I only went this deep when I realized I had to in order to build a long-term supportable solution.

So that's where the crux of my position is based from. I'm assuming the majority of users won't do a deep dive, because they won't expect they need to, don't have the domain knowledge to do so, or are simply too busy. I assume they don't want to make changes to what's published, as that makes it hard to maintain. I do assume users and developers will make mistakes, because mistakes are common for a variety of reasons and is why we have so many software CVEs. I think a published implementation should close known security concerns by default, as I think that's what the majority of users will expect.

For this specific case I was surprised that we were passing the cookie on to the upstream app, as was the upstream application developer. I've set up SSO proxies like this in the past, this isn't my first implementation. In all cases the upstream app didn't need the session cookie, as it was offloading auth to the proxy. User tracking was done via some other means, usually passing the user's email in as a header. If the app needed a session token it managed that itself, independent of the auth proxy solution. The two parts are independent by design in all the implementations I've built and maintained.

My thinking here is that the cookie should be hidden unless it's known that it's needed. In that case it's trivial for the deploying engineer to cut that block out. Require the user to consciously turn the security stance down, not require them to know they need to turn it up.

But, you and I clearly have different viewpoints on this. Nginx also doesn't really give a better way to do remove this cookie that I can find. This is the neatest solution I know how to build, and it does have pitfalls. Given that, what if we add a "Known potential security concerns" or similar section to the README that documents these concerns and why the reference implementation doesn't resolve them? That way deploying engineers know they have to accept or address these issues on their own. That removes my core concern that issues like this won't be discovered without a deep dive into the code.

route443 commented 5 months ago

Hi @ag-TJNII ,

Oh, thank you for such a detailed response. I truly appreciate the level of empathy with which you approached writing it. Although I no longer deploy solutions into production, I have had that experience in the past and fully understand the logic of "short-term gain, long-term pain." This can indeed be an issue even in larger organizations.

Regarding this specific case, no, no one is advocating for session cookies on NJS OIDC as a method of SSO. These cookies are definitely not needed by the app and can be considered unnecessary overhead. However, by default, we don't consider this a problem. Typically, this is not done for the following reasons:

  1. KISS principle (keep it simple and straightforward). Don't overcomplicate things by default, maintain behavior that is simple and predictable.
  2. Performance. These types of manipulations are not free. If passing these cookies is not an issue for you, why should you incur the cost?

I can say that this is not just our approach. For instance, F5 APM does not strip MRHSession cookies before passing them to the pool, nor does Netscaler gateway strip NSC_* cookies. It is common practice, but it's understood that all solutions offer various ways to modify this behavior if it becomes a security concern in your deployment. So, I wouldn't consider this behavior unexpected.

I agree that we should add a "Known potential security concerns" section to the README, where potential risks and mitigation strategies are described. Additionally, I think we can create a GitHub Wiki to host configuration examples and further/detailed documentation.