Tests: chunked request body

andrey-zelenkov commented 2 weeks ago

Tests for the #1262.

andrey-zelenkov commented 1 week ago

Hi Zhidao!

I set the default value of the chunked_transform false.

Why do we need chunked_transform option at all?

ASGI spec explicitly states how to handle chunked requests https://asgi.readthedocs.io/en/latest/specs/www.html:

Servers are responsible for handling inbound and outbound chunked transfer encodings. A request with a chunked encoded body should be automatically de-chunked by the server and presented to the application as plain body bytes; a response that is given to the server with no Content-Length may be chunked as the server sees fit.

and

Note that if the request is being sent using Transfer-Encoding: chunked, the server is responsible for handling this encoding. The http.request messages should contain just the decoded contents of each chunk.

Same for WSGI https://peps.python.org/pep-3333/#other-http-features:

WSGI servers must handle any supported inbound “hop-by-hop” headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.

ac000 commented 1 week ago

Hi Zhidao!

I set the default value of the chunked_transform false.

Why do we need chunked_transform option at all?

It's an experimental feature and we wanted to gate it.

ASGI spec explicitly states how to handle chunked requests https://asgi.readthedocs.io/en/latest/specs/www.html:

Servers are responsible for handling inbound and outbound chunked transfer encodings. A request with a chunked encoded body should be automatically de-chunked by the server and presented to the application as plain body bytes; a response that is given to the server with no Content-Length may be chunked as the server sees fit.

and

Note that if the request is being sent using Transfer-Encoding: chunked, the server is responsible for handling this encoding. The http.request messages should contain just the decoded contents of each chunk.

Same for WSGI https://peps.python.org/pep-3333/#other-http-features:

WSGI servers must handle any supported inbound “hop-by-hop” headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.

As it stands language modules won't see chunked data, Unit will buffer them up and convert the request to Content-Length

andrey-zelenkov commented 1 week ago

As it stands language modules won't see chunked data, Unit will buffer them up and convert the request to Content-Length

Even when "chunked_transform": False?

ac000 commented 1 week ago

As it stands language modules won't see chunked data, Unit will buffer them up and convert the request to Content-Length

Even when "chunked_transform": False?

AIUI

If chunked_transform is false then Unit behaves as it does now and rejects chunked requests.

If it's true, Unit will buffer all the chunked data up and convert it to a Content-Length request.

In either case language modules will only see Content-Length requests as they do now.

andrey-zelenkov commented 1 week ago

If chunked_transform is false then Unit behaves as it does now and rejects chunked requests.

It doesn't work that way. In the current implementation of #1262, chunked requests are always accepted and served.

In either case language modules will only see Content-Length requests as they do now.

No, in case of "chunked_transform": False app will receive a request and will not see any Content-Length. At least in current version of patch.

After asking @hongzhidao about it, he explained that the main reason for having this option is the problem with the current implementation: the decision about de-chunking is made before routing and knowing where the request will be sent (proxy or application). Disabling de-chunking makes the proxy work correctly, while enabling de-chunking makes applications work correctly. So, by setting chunked_transform option, you are choosing what will be broken and what will work correctly. This explanation makes some sense to me now.

My current thought is to set chunked_transform to True by default. The only drawback I can see is de-chunking proxied requests and I would expect that most of the servers should be able to live with it (or even it will be possible to chunk it back later by implementing body modification using njs) and Unit does extra work which might be useless, whereas setting it to False leads to broken applications on the application server.

ac000 commented 1 week ago

If chunked_transform is false then Unit behaves as it does now and rejects chunked requests.

It doesn't work that way. In the current implementation of #1262, chunked requests are always accepted and served.

In either case language modules will only see Content-Length requests as they do now.

No, in case of "chunked_transform": False app will receive a request and will not see any Content-Length. At least in current version of patch.

@callahad @hongzhidao

Well I'm confused because we just had a meeting where it was agreed that language modules (for now anyway) will not see chunked requests.

Sending chunked requests to language modules will be completely pointless as they won't know what to do with it, for all I know they may even just crash...

ac000 commented 1 week ago

commit 93d8b3d1424387f2a5ca264b9b9653c9da21a9a7
Author: Zhidao HONG <z.hong@f5.com>
Date:   Tue Jun 11 17:59:00 2024 +0800

    http: Support chunked request bodies

    This is a temporary support for chunked request bodies by converting
    to Content-Length. This allows for processing of such requests until
    a more permanent solution is developed.

    A new configuration option "chunked_transform" has been added to enable
    this feature. The option can be set as follows:
      {
        "settings": {
          "chunked_transform": true
        }
      }
    By default, this option is set to false.

    Please note that this is an experimental implementation.

To me that means setting chunked_transform to true will allow chunked requests to be accepted, albeit by being converted to Content-Length requests.

The only sensible behaviour for defaulting it to false can be to retain the current behaviour, which I thought was whole point of having this setting...

hongzhidao commented 1 week ago

The only sensible behaviour for defaulting it to false can be to retain the current behaviour, which I thought was whole point of having this setting...

Hmm, the false setting is also not clear to me.

If chunked_transform is false then Unit behaves as it does now and rejects chunked requests. If it's true, Unit will buffer all the chunked data up and convert it to a Content-Length request. In either case language modules will only see Content-Length requests as they do now.]

It doesn't work that way. In the current implementation of https://github.com/nginx/unit/pull/1262, chunked requests are always accepted and served.

I changed the logic and will wait for a final decision from the team. If it's false, it will be rejected as before, which returns 411 (NXT_HTTP_LENGTH_REQUIRED = 411).

callahad commented 1 week ago

For now, let's merge to master with chunked_transform defaulting to false.

That keeps the new behavior opt-in until we work out all of the kinks with proxy vs application.

ac000 commented 1 week ago

@callahad @hongzhidao

OK, so I think things are as I expected. But just to be clear.

When chunked_transform = false, Unit will behave as it does now and will reject chunked requests with a 411 Length Required status code?

@hongzhidao

I think that could be more clearly stated in the commit message for ("http: Support chunked request bodies")

Perhaps by extending the second last line to something like

By default, this option is set to false, which retains the current
behaviour of rejecting chunked requests with a '411 Length Required'
status code.

andrey-zelenkov commented 1 week ago

Rebased, added tests for the chunked_transform option and last chunk.

% git range-diff e5dbfcf7...440837dc
 -:  -------- >  1:  e77a0c16 Tests: explicitly specify 'f' prefix to format string before printing
 -:  -------- >  2:  c9dced37 Tests: print unit.log on unsuccessful unmount
 -:  -------- >  3:  98983f3f Add a GitHub discussions badge to the README
 -:  -------- >  4:  a7e3686a Tools: improved error handling for unitc
 -:  -------- >  5:  d7ec30c4 ci: Limit when to run checks on pull-requests
 -:  -------- >  6:  04a24f61 http: fix use-of-uninitialized-value bug
 -:  -------- >  7:  965fc94e fuzzing: add fuzzing infrastructure in build system
 -:  -------- >  8:  a93d878e fuzzing: add fuzzing targets
 -:  -------- >  9:  665353dc fuzzing: add a fuzzing seed corpus and dictionary
 -:  -------- > 10:  5b65134c fuzzing: add a basic README
 -:  -------- > 11:  35a572c2 fuzzing: Add a .gitattributes file
 1:  e5dbfcf7 ! 12:  440837dc Tests: chunked request body
    @@ test/test_chunked.py (new)
     +client = ApplicationPython()
     +
     +
    -+def test_chunked():
    ++@pytest.fixture(autouse=True)
    ++def setup_method_fixture():
     +    client.load('mirror')
     +
    ++    assert 'success' in client.conf(
    ++        {"http": {"chunked_transform": True}}, 'settings'
    ++    )
    ++
    ++
    ++def test_chunked():
     +    def chunks(chunks=[]):
     +        body = ''
     +
    @@ test/test_chunked.py (new)
     +
     +
     +def test_chunked_pipeline():
    -+    client.load('mirror')
    -+
     +    sock = client.get(
     +        no_recv=True,
     +        headers={
    @@ test/test_chunked.py (new)
     +    assert len(re.findall('%', resp)) == 1
     +
     +
    -+@pytest.mark.skip('not yet')
     +def test_chunked_max_body_size():
    -+    client.load('mirror')
    -+
     +    assert 'success' in client.conf(
    -+        {'http': {'max_body_size': 1024}}, 'settings'
    ++        {'max_body_size': 1024, 'chunked_transform': True}, 'settings/http'
     +    )
     +
     +    body = f'{2048:x}\r\n{"x" * 2048}\r\n0\r\n\r\n'
    @@ test/test_chunked.py (new)
     +    )
     +
     +
    -+@pytest.mark.skip('not yet')
    -+def test_chunked_invalid():
    -+    client.load('mirror')
    ++def test_chunked_after_last():
    ++    resp = client.get(
    ++        headers={
    ++            'Host': 'localhost',
    ++            'Connection': 'close',
    ++            'Transfer-Encoding': 'chunked',
    ++        },
    ++        body='1\r\na\r\n0\r\n\r\n1\r\nb\r\n0\r\n\r\n',
    ++    )
     +
    ++    assert resp['status'] == 200
    ++    assert resp['headers']['Content-Length'] == '1'
    ++    assert resp['body'] == 'a'
    ++
    ++
    ++def test_chunked_transform():
    ++    assert 'success' in client.conf(
    ++        {"http": {"chunked_transform": False}}, 'settings'
    ++    )
    ++
    ++    assert (
    ++        client.get(
    ++            headers={
    ++                'Host': 'localhost',
    ++                'Connection': 'close',
    ++                'Transfer-Encoding': 'chunked',
    ++            },
    ++            body='0\r\n\r\n',
    ++        )['status']
    ++        == 411
    ++    )
    ++
    ++
    ++def test_chunked_invalid():
     +    # invalid chunkes
     +
     +    def check_body(body):

nginx / unit

Tests: chunked request body #1307