get user from upstream proxy feature request

532910 commented 5 years ago

Please add support to set user (under which application is runned) dynamically by the variable from the upstream proxy.

VBart commented 5 years ago

Could you elaborate more about your use case?

Note that changing user of the process is quite expensive operation and this will be a bad idea to set it on each request. Moreover, some application processes can handle multiple requests at the same time and if each request will contain its own user then it won't be possible to handle these requests in the same process. Overall it complicates implementation a lot and degrades performance.

532910 commented 4 years ago

for apps without multiuser support
for better isolation (run instances for different real users under different linux users)
for easier app development (you may not care about authentication / authorisation at all.

Note that changing user of the process is quite expensive operation and this will be a bad idea to set it on each request

Not changing, but run requests from different users as separate processes.

Moreover, some application processes can handle multiple requests at the same time and if each request will contain its own user then it won't be possible to handle these requests in the same process.

Yes, see 2.

Overall it complicates implementation a lot and degrades performance.

Unit already can run app under specific (static) user. Is it hard to get this decision dynamically?

VBart commented 4 years ago

@532910 You can do it already. You just need to setup a number of apps with different user settings and then using routing settings you can route different requests to these apps. See the documentation for routing details: https://unit.nginx.org/configuration/#routes

532910 commented 4 years ago

Sure, but it will be the static configuration, and the question is how to make it dynamic.

VBart commented 4 years ago

Currently you can run as many processes as you want and dynamically route requests between them depending on any request parameter (including some variable from upstream proxy). Also you can dynamically add new processes and change routing. Starting a new process for a new user for each request will be very expensive and inefficient.

532910 commented 4 years ago

Could you point me to the documentation how to route requests based on a variable from upstream proxy?

VBart commented 4 years ago

@532910 https://unit.nginx.org/configuration/#condition-matching Matching rules can include any arbitrary header fields or URI arguments. For example, you can set X-Route-To: somevalue header on upstream proxy and then create a rule that will check this header value.

VBart commented 4 years ago

@532910 Sorry, maybe I misunderstood your case... I thought that you have an nginx setup in front of Unit with an upstream block and you're using proxy_pass to pass requests to Unit and want to set some variable that Unit will use for routing.

But if you want to route requests based on the response from proxied server using proxy action in Unit, then it's completely different matter. Then it raises other questions:

What to do with the request body? In order to pass request to some application after the proxy response, then we have no other choices than to buffer it completely. What if it will be huge?
What to do with the original proxy response and its body?

Do you actually asking about something like X-Accel-Redirect or nginx auth request module? Could you give examples of any existing implementations or similar functionalities?

532910 commented 4 years ago

Sorry for poor explanation. I mean the following setup:

Nginx authenticates user (with auth_basic for example) and passes $remote_user (as X-Remote-User header) to the Unit.
Unit runs application under user passed from Nginx.

Is it clear?

i4ki commented 4 years ago

If by runs application under user passed from nginx you mean really setuid(2) the unit application process, then I think this raises important security issues (apart from the performance, as @VBart said). For a safe implementation, it would require another option to enable this behavior, otherwise, attackers can use this header to escalate privilege (access static files in privileged directories or given an RCE, execute commands as root). And even if we add an option, people commonly copy-paste configurations and it's easy to such a harmful configuration like that be used in production by mistake.

I understand some use cases for this, as for example, if you have a web application for management of job scripts in a server, you can simplify stuff by using the same UNIX credentials and then ensuring all jobs run by the same user. But I'm concerned about the security implications.

You can emulate this behavior today by your own risk, by running your application with "user": "root" and then looking for the Remote-User header in the application handler and change the application user by using the setuid(2) syscall.

532910 commented 4 years ago

I see no security issue here. In my view it should be implemented as an opportunity to use variable for user and group:

"app": {
  "user": $XRemoteUser,
}

And even if we add an option, people commonly copy-paste configurations and it's easy to such a harmful configuration like that be used in production by mistake.

Sounds odd. You can't save people from themselves.

running your application with "user": "root"

And this is the security issue.

i4ki commented 4 years ago

Your configuration raises more risks, variable substitution from untrusted headers used for setting credentials is the perfect exploration vector. There are plenty of attack vectors related to different header parsing implementation between servers and client libraries. For example, HTTP Request smuggling exists in the wild, and the only way to be safe is by choosing carefully the servers (and clients) in the stack accordingly to how they process the headers. Take a look in this recent vuln: https://blog.paloaltonetworks.com/2019/10/cloud-kubernetes-vulnerabilities/ Some servers interpret Content-Length before Transfer-Encoding leading to pass request headers without parsing to downstream servers: https://regilero.github.io/english/security/2019/10/17/security_apache_traffic_server_http_smuggling/

Your stack is probably just nginx and unit, but we cannot say the same for all users.

And this is the security issue.

There's little difference to what you proposed. The only difference is that in this case, Unit will load the language library (eg.: python.unit.so) as root and your app should drop privileges. In case of setting user: $XRemoteUser, the child is spawned as root as well, but Unit will drop privileges sooner. As I said, for your own risk.

i4ki commented 4 years ago

My point is: This feature has a very small use case and opens a big exploration vector. I agree that we cannot save people from themselves, but people trust NGINX and if we have this option they will use without knowing the risks.

532910 commented 4 years ago

It's quite common setup, when the front-end authenticates users and the back-end trusts REMOTE_USER variable passed from the front-end. It doesn't matter how the back-end runs these queries under one static user or dynamically getting it from the headers. The back-end trusts front-end in both cases. There no security issue here.

The only difference is that to be able to setuid you need a master privileged process, but Unit already has it!

i4ki commented 4 years ago

I didn't say there's a security issue but only that this could be used to escalate privilege in case of other flaws.

Applications trusting an X-Remote-User are indeed a common setup and I'm not arguing against that but about the implications.

In most cases, X-Remote-User is an application user and not an operating system user. then If someone exploits an HTTP smuggling or another flaw (openshift related flaw) he can only spoof other users in the web application (much less harmful).

In your case, if someone exploits the upstream service or even the proxy, he can reach any downstream server and run applications as root. Any flaw and ton of machines are compromised. (I'm not judging your real architecture, other measures can be in place to mitigate issues, but as I said it's complicated because it involves how different softwares implement the HTTP protocol.

AFAIK, the Unit feature to set "user" and "group" exist to set just the right credentials required for the application, this is a static configuration for security reasons because the system administrator wants to make sure the application runs with the bare minimum privileges and no one other than the system administrator should be able to change this. But in your use case, the application runs as any user, like the login/sudo/su(1) software but passwordless, and then, in my opinion, this is a privileged application even if you delegate to Unit the task of setuid. Other than that, it enables lateral movement if the headers can be forged in any way from the trusted machine.

Well, I don't have much more to add here, but only that this feature is bells and whistles for a security expert.

532910 commented 4 years ago

Dovecot, for example, has two modes: High-performance mode with all processes runned under the same user, and High-security mode that uses separate linux users for separate mail system users.

VBart commented 4 years ago

@532910 Is there any problem with the following setup:

{

    "routes": [

        {
            "match": {
                "headers": { "X-Remote-User": "user1" }
            },

            "action": {
                "pass": "applications/user1"
            }
        },

        {
            "match": {
                "headers": { "X-Remote-User": "user2" }
            },

            "action": {
                "pass": "applications/user2"
            }
        },

        ...

        {
            "match": {
                "headers": { "X-Remote-User": "userN" }
            },

            "action": {
                "pass": "applications/userN"
            }
        },
    ],

    "applications": {
        "user1": {
            "user": "user1",
            "group": "group1"
        },

        "user2": {
            "user": "user2",
            "group": "group2"
        },

        ...

        "userN": {
            "user": "userN",
            "group": "groupN"
        }
    }
}

?

What you're suggesting with using variable ("user": $XRemoteUser) will be extremely slow. It turns any application into something similar to CGI script, as it requires to start a new process for each request. Nowadays CGI is considered as obsolete way of running web-applications, mostly because of very low performance.

Dovecot deals with long sessions, where a user is authenticated in the beginning of a session, and then this session may last a few hours. Even in this case the mode is opposed to high-performance. Unit usually have to deal with thousands requests per second, which means that each request is handled during a few microseconds. Microseconds in case of Unit vs. hours in case of Dovecot. It won't be just not so high-performance mode, in case of Unit this mode will slow down requests handling a lot.

532910 commented 4 years ago

Is there any problem with the following setup

It's static. You can't just add a new user in the ldap for example, but also need to maintain it the unit configuration.

as it requires to start a new process for each request

It's not true, as all requests from the same user will be passed to the same process. The correct is "it requires to start a new process for each request from different users".

VBart commented 4 years ago

@532910

It's not true, as all requests from the same user will be passed to the same process. The correct is "it requires to start a new process for each request from different users".

How do we know that there will be other requests for the same user soon? It's quite common situation that there are more requests than available processes. In order to avoid exhausting all resources with new processes and be killed, almost all application servers maintain some limits on maximum number of application processes. Unit isn't exception here as there are almost no ways to deal with high load without limiting processes. Typical situation: we have 100 processes busy with processing and 200 other requests waiting in queue for processing. If some process finishes a request for "alex", but there are no other requests right now from "alex", what should we do? If we don't shutdown this process and start a new one, then it will just waste resources and reserve place in the limit. Overall this approach rises many complex problems how to distribute resources between different users, without even knowing the whole user list and its size.

It's static. You can't just add a new user in the ldap for example, but also need to maintain it the unit configuration.

Unit has dynamic API exactly to allow easy automation of such tasks. You can use a simple script that will synchronise users between your LDAP configuration and Unit. As I said above, knowledge about users list is necessary for Unit in order to properly distribute resources.

nginx / unit

get user from upstream proxy feature request #348