nginx / unit

NGINX Unit - universal web app server - a lightweight and versatile open source server that simplifies the application stack by natively executing application code across eight different programming language runtimes.
https://unit.nginx.org
Apache License 2.0
5.26k stars 322 forks source link

HTTP: Ensure REQUEST_URI immutability. #1162

Closed hongzhidao closed 1 month ago

hongzhidao commented 4 months ago
  1. Another implementation of proxy request creation.</de.
  2. Made the $request_uri constant.
  3. Passed constant REQUEST_URI to applications.

Need to separate the PR after tests pass.

Here's a manual test.

# conf.json

{
    "listeners": {
        "*:8080": {
            "pass": "routes/a"
        },
        "*:8081": {
            "pass": "routes/b"
        }
    },
    "routes": {
        "a": [
            {
                "action": {
                    "rewrite": "/foo.php",
                    "proxy": "http://127.0.0.1:8081"
                }
            }
        ],
        "b": [
            {
                "action": {
                    "rewrite": "/index.php",
                    "pass": "applications/app"
                }
            }
        ]
    },
    "applications": {
        "app": {
            "type": "php",
            "root": "/www",
            "index": "index.php"
        }
    }
}
# /www/index.php

<?php

phpinfo();
> curl http://127.1:8080
$_SERVER['REQUEST_URI'] => /foo.php
hongzhidao commented 4 months ago

Hi @andrey-zelenkov, Since this is a change, we need to add more tests to it.

  1. add rewrite on "proxy" which is similar to "php application rewrite". It doesn't depend on the PR.
  2. change the tests for $request_uri if possible.
  3. change the tests for "applications" if possible. Thanks.
ac000 commented 4 months ago

Hmm, I thought the idea was that REQUEST_URI doesn't get munged?

> curl http://127.1:8080
$_SERVER['REQUEST_URI'] => /foo.php

In which case I would have expected REQUEST_URI would be set to / in the above instance.

hongzhidao commented 4 months ago

Hmm, I thought the idea was that REQUEST_URI doesn't get munged?

> curl http://127.1:8080
$_SERVER['REQUEST_URI'] => /foo.php

In which case I would have expected REQUEST_URI would be set to / in the above instance.

Here are a few namings:

  1. REQEUST_URI is for applications, it must be unchanged.
  2. $request_uri corresponding internal r->target, both of them are unchanged.

So your understanding is correct. For the above configuration, there are two requests. The first one is the client request and is accepted by 8080.

"action": {
                    "rewrite": "/foo.php",
                    "proxy": "http://127.0.0.1:8081"
                }

The request_uri is unchanged. Then Units generates the second request with a rewritten uri to pass to 8081. Now the second request is /foo.php, and it will be accepted by 8081.

{
                "action": {
                    "rewrite": "/index.php",
                    "pass": "applications/app"
                }
            }
hongzhidao commented 2 months ago

Hi, I created a new PR https://github.com/nginx/unit/pull/1214 prepared for this patch, and it will simplify this PR a lot. I'll rework this PR after https://github.com/nginx/unit/pull/1214 and https://github.com/nginx/unit/pull/1211 are shipped.

javorszky commented 2 months ago

@andrey-zelenkov @hongzhidao: I want to double check the failing tests: https://github.com/nginx/unit/actions/runs/8718395388/job/23915553886?pr=1162#step:41:1060

I tested manually, and given the configuration in the test and the request, the log line ends up being

::1 /bar /blah%2fblah?a=b
192.168.65.1 /foo /blah%2fblah?a=b

This is correct given the request URI. I think those tests should be changed given the outcome of the code they're testing has been changed in this PR.

I'm working on a PR to this branch at https://github.com/hongzhidao/unit/pull/1, but wanted to run this by you as you're most familiar with the tests πŸ™‚

I'll keep working on that PR as I fix the tests.

andrey-zelenkov commented 2 months ago

Thank you for pointing this out! I discussed it with @hongzhidao and we agreed to include test fixes in his PR. This will demonstrate how these changes affect tests and facilitate GitHub Actions checks.

javorszky commented 2 months ago

I think it's because you brought in my change, the change is of a different sha as my signature is for, so it fails. The original commit is verified if that helps.

Specifically the differences between https://github.com/hongzhidao/unit/pull/1/commits/ae4081f32586e8d53ea71a5cd287040088007205 (Verified) and https://github.com/nginx/unit/pull/1162/commits/6d6c68cae8440b3d6fb6a2e250d4e5a9ab5a1e09 (Unverified) include:

Each of these on their own would change the commit's digest that the original signature is for πŸ™‚

hongzhidao commented 2 months ago

Are you sure this has no functional change?

Yes, I think so. Before we change the rule, I mean REQUEST_URI is changeable, the target and r->target have the same value.

target = path + args;
r->target = path + args;

When URL rewriting happens, both path and r->target are changed.

hongzhidao commented 2 months ago
From f8d106920ebda1671d56d6978cfc1cc9da4b5625 Mon Sep 17 00:00:00 2001
From: Zhidao HONG <z.hong@f5.com>
Date: Tue, 30 Apr 2024 14:45:18 +0800
Subject: [PATCH] http: Ensure REQUEST_URI immutability

Previously, the REQUEST_URI within Unit could be modified,
for example, during uri rewritting. We plan to make $request_uri
immutable and pass constant REQUEST_URI to applications.
Based on the new requirement, we remove `r->target` rewritting
in the rewrite module.

Closes: https://github.com/nginx/unit/issues/#916

Some small typos: s/rewritting/rewriting/g and s/#916/916/ in the closes tag...

We plan to make $request_uri immutable and pass constant REQUEST_URI to applications.

This commit does that right? as it makes it sounds as if there is further work to do...

Good point, it is a decision already made, not something that will happen in the future.

hongzhidao commented 2 months ago

I think it's because you brought in my change, the change is of a different sha as my signature is for, so it fails. The original commit is verified if that helps.

Specifically the differences between hongzhidao@ae4081f (Verified) and 6d6c68c (Unverified) include:

  • 6d6c has more code changes
  • 6d6c has a different commit message (title has test: prefix and empty commit message body)
  • 6d6c has two authors

Each of these on their own would change the commit's digest that the original signature is for πŸ™‚

I did it based on your patch and committed it with --author="...", is this way correct?

ac000 commented 2 months ago

Yes, I think so. Before we change the rule, I mean REQUEST_URI is changeable, the target and r->target
have the same value.

target = path + args; r->target = path + args;

When URL rewriting happens, both path and r->target are changed.

I was more thinking about

nxt_h1p_peer_header_send()
    nxt_h1p_peer_request_target()
        nxt_encode_complex_uri()

Could r->target & target be different after the call to nxt_encode_complex_uri()?

javorszky commented 2 months ago

I did it based on your patch and committed it with --author="...", is this way correct?

Yes, but the verification is expected to fail. Depends what's more important to the project. In all honesty I don't care to get my patch included with me as an author, as long as the tests are fixed, this PR merged, and a new release with the request uri changes published.

ac000 commented 2 months ago

We currently don't really worry about signing commits, so I'd say don't worry about it.

hongzhidao commented 2 months ago

Yes, I think so. Before we change the rule, I mean REQUEST_URI is changeable, the target and r->target have the same value.

target = path + args;                                                                                     
r->target = path + args;                                                                                  

When URL rewriting happens, both path and r->target are changed.

I was more thinking about

nxt_h1p_peer_header_send()
    nxt_h1p_peer_request_target()
        nxt_encode_complex_uri()

Could r->target & target be different after the call to nxt_encode_complex_uri()?

Hmm, they should be the same. Take /blah%2Fblah as an example. The r->target is /blah%2Fblah, and the r->quoted_target is set after the uri parsed. Then in the peer header send, we did something like this.

    if (r->quoted_target) {
        encode = nxt_encode_complex_uri(NULL, r->path->start,
                                        r->path->length);
    } else {
        encode = 0;
    }
ac000 commented 2 months ago

Just so I fully understand this.

static nxt_int_t
nxt_h1p_peer_request_target(nxt_http_request_t *r, nxt_str_t *target)
{
    u_char  *p;
    size_t  size, encode;

    if (!r->uri_changed) {
        *target = r->target;
        return NXT_OK;
    }

    if (!r->quoted_target && r->args->length == 0) {
        *target = *r->path;
        return NXT_OK;

So you're saying that at this point ->target and ->path are the same?

Carrying on

...

    target->start = nxt_mp_nget(r->mem_pool, size);
    if (target->start == NULL) {
        return NXT_ERROR;
    }

    if (r->quoted_target) {
        p = (u_char *) nxt_encode_complex_uri(target->start, r->path->start,
                                              r->path->length);

    } else {
        p = nxt_cpymem(target->start, r->path->start, r->path->length);
    }

    if (r->args->length > 0) {
        *p++ = '?';
        p = nxt_cpymem(p, r->args->start, r->args->length);
    }

    target->length = p - target->start;

...

After all that, target would be the same as r->target?

hongzhidao commented 2 months ago

So you're saying that at this point ->target and ->path are the same?

Yep, and it might be not changed, in this case, we don't need to allocate a new memory.

After all that, target would be the same as r->target?

Correct.

ac000 commented 2 months ago

After all that, target would be the same as r->target?

Correct.

That then begs the question, why even call nxt_h1p_peer_request_target() ? If it's going to essentially return r->target...

hongzhidao commented 2 months ago

I'm afraid of forgetting something here. Let me explain it again, and let's assume we are talking about the new change that makes REQUEST_URI constant.

After all that, target would be the same as r->target?

Without uri rewriting, they are the same. Before making REQUEST_URI constant, take the first commit as an example, target from the calculation in nxt_h1p_peer_request_target() has the same value with `r->target.

Then we decide to make REQUEST_URI constant, and internally, I keep r->target unchanged, since the variable $request_uri corresponds to r->target. But for the proxy module, it needs to pass changeable request uri (path + arguments) to the backend. Note that path is changeable with URI rewriting.

After all that, target would be the same as r->target?

If we are talking about the new change of REQUEST_URI rule, I'd say not.

ac000 commented 2 months ago

OK, I think we're getting there...

To clarify, I'm talking specifically about the first change

From: Zhidao HONG <z.hong@f5.com>
Date: Tue, 30 Apr 2024 14:30:24 +0800
Subject: [PATCH] http: Use consistent target in nxt_h1p_peer_header_send()

No functional changes.

So at the point of this change, I.e before the second patch, target & r->target would be the same.

After the second patch http: Ensure REQUEST_URI immutability they may be different.

Assuming I've got that right, I think I'd like a little more detail in the first commit message saying that this change is required for the second commit, after which target & r->target may be different.

hongzhidao commented 1 month ago

OK, I think we're getting there...

To clarify, I'm talking specifically about the first change

From: Zhidao HONG <z.hong@f5.com>
Date: Tue, 30 Apr 2024 14:30:24 +0800
Subject: [PATCH] http: Use consistent target in nxt_h1p_peer_header_send()

No functional changes.

So at the point of this change, I.e before the second patch, target & r->target would be the same.

After the second patch http: Ensure REQUEST_URI immutability they may be different.

Assuming I've got that right, I think I'd like a little more detail in the first commit message saying that this change is required for the second commit, after which target & r->target may be different.

Makes sense, thanks.

ac000 commented 1 month ago

Hi @hongzhidao

You can add my Reviewed-by: to the two http: patches (not the tests as I haven't really looked at them), you should also add your s-o-b, so if you can add

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
Signed-off-by: Zhidao HONG <z.hong@f5.com>

(for the second commit they should come immedaitely after the Closes: tag... you know how it goes...)

hongzhidao commented 1 month ago

Hi @ac000 @andrey-zelenkov, It only failed one test, but it didn't seem to have anything to do with the patches.

[test (ruby-3.2, ubuntu-latest)](https://github.com/nginx/unit/actions/runs/8996125783/job/24712141194#logs)
failed 11 minutes ago in 2m 55s
Search logs

        assert 'success' in client.conf(
            '"/ruby/status_int/config.ru"',
            'applications/status_int/script',
        )

        assert 'success' in client.conf(
            '"/ruby/status_int"',
            'applications/status_int/working_directory',
        )

>       assert client.get()['status'] == 200, 'status int'

test/test_ruby_isolation.py:43: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test/unit/http.py:165: in get
    return self.http('GET', **kwargs)
test/unit/http.py:103: in http
    resp = self.recvall(sock, **recvall_kwargs).decode(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <unit.applications.lang.ruby.ApplicationRuby object at 0x7f0200cac1c0>
sock = <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 57976), raddr=('127.0.0.1', 8080)>
kwargs = {}, timeout_default = 60, timeout = 60, buff_size = 4096, data = b''
rlist = []

    def recvall(self, sock, **kwargs):
        timeout_default = 60

        timeout = kwargs.get('read_timeout', timeout_default)
        buff_size = kwargs.get('buff_size', 4096)

        data = b''
        while True:
            rlist = select.select([sock], [], [], timeout)[0]
            if not rlist:
                # For all current cases if the "read_timeout" was changed
                # than test do not expect to get a response from server.
                if timeout == timeout_default:
>                   pytest.fail("Can't read response from server.")
E                   Failed: Can't read response from server.

test/unit/http.py:189: Failed
=========================== short test summary info ============================
FAILED test/test_ruby_isolation.py::test_ruby_isolation_rootfs - Failed: Can't read response from server.
ERROR test/test_ruby_isolation.py::test_ruby_isolation_rootfs - SystemExit: Could not unmount filesystems in tmpdir ({temporary_dir}).
========= 1 failed, 38 passed, 7 skipped, 1 error in 132.30s (0:02:12) =========
Error: Process completed with exit code 1.
ac000 commented 1 month ago

Yes, it's a known issue.

hongzhidao commented 1 month ago

Ok. Let me know if it's ready to ship.

ac000 commented 1 month ago

Yes, it looks OK to me...