Support chunked request bodies

hongzhidao commented 1 month ago

A draft patch for the MVP of this feature.

Internally, I transferred the chunked request to content-length request since the application process can't handle chunked request now.

Test

conf.json

{
    "listeners": {
        "*:8080": {
            "pass": "routes"
        }
    },
    "routes": [
        {
            "action": {
                "pass": "applications/app"
            }
        }
    ],
    "applications": {
        "app": {
            "type": "python",
            "path": "/www",
            "module": "wsgi"
        }
    }
}

/www/wsgi.py

def application(environ, start_response):
    content_length = int(environ.get('CONTENT_LENGTH', 0))
    body = bytes(environ['wsgi.input'].read(content_length))
    start_response(
        '200', [],
    )
    return [body]

> curl -X POST -d 'hello' -H "Transfer-Encoding: chunked" http://127.0.0.1:8080 -v
Note: Unnecessary use of -X or --request, POST is already inferred.
* Rebuilt URL to: http://127.0.0.1:8080/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> POST / HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.61.1
> Accept: */*
> Transfer-Encoding: chunked
> Content-Type: application/x-www-form-urlencoded
>
> 5
* upload completely sent off: 12 out of 5 bytes
< HTTP/1.1 200 OK
< Server: Unit/1.33.0
< Date: Wed, 29 May 2024 09:04:27 GMT
< Transfer-Encoding: chunked
<
* Connection #0 to host 127.0.0.1 left intact
hello

avahahn commented 1 month ago

fixes: #445

avahahn commented 1 month ago

Here's a patch that fixes the formatting nits:

diff --git a/src/nxt_h1proto.c b/src/nxt_h1proto.c
index 6550efb9..16f6dcaa 100644
--- a/src/nxt_h1proto.c
+++ b/src/nxt_h1proto.c
@@ -893,10 +893,8 @@ nxt_h1p_request_body_read(nxt_task_t *task, nxt_http_request_t *r)

         nxt_buf_t *out = nxt_http_chunk_parse(task, &h1p->chunked_parse, in);

-        if (h1p->chunked_parse.chunk_error ||
- h1p->chunked_parse.error) {
-            status = NXT_HTTP_NOT_IMPLEMENTED
-;
+        if (h1p->chunked_parse.chunk_error || h1p->chunked_parse.error) {
+            status = NXT_HTTP_NOT_IMPLEMENTED;
             goto error;
         }

@@ -904,8 +902,7 @@ nxt_h1p_request_body_read(nxt_task_t *task, nxt_http_request_t *r)
             out->next = nxt_buf_sync_alloc(r->mem_pool, NXT_BUF_SYNC_LAST);
             out = nxt_h1p_chunk_create(task, r, out);
             r->body = out;
-            r->state->ready_handler(task, r,
-NULL);
+            r->state->ready_handler(task, r, NULL);
             return;
         }
     }

Can we get this PR converted to a non-draft PR? I think we should push this in as a solution for chunked transfer encoded requests.

hongzhidao commented 1 month ago

Can we get this PR converted to a non-draft PR?

Does non-draft PR mean it's ready to review? And it will run with GH actions for tests?

ac000 commented 1 month ago

Can we get this PR converted to a non-draft PR?

Does non-draft PR mean it's ready to review? And it will run with GH actions for tests?

I generally regard a non-draft PR as reviewable (whether I've been assigned to review or not), sometimes I'll do a rough review of a draft PR depending on circumstances...

Draft and non-draft PR's both get run through the GH actions, of which you can generally ignore the Ruby 3.2 failures....

hongzhidao commented 1 month ago

Good to know, thanks. Btw, the way to support the chunked body should be different from the patch. The final solution is not clear yet for me.

callahad commented 1 month ago

Btw, the way to support the chunked body should be different from the patch. The final solution is not clear yet for me.

Can you say more about how this approach doesn't feel quite right to you?

We need to land something ASAP, and should act as though the deadline to have a mergeable / testable branch is the end of this week.

Is the current approach good enough for that, or do you think this approach is so flawed that you would block merging?

ac000 commented 1 month ago

We need to land something ASAP, and should act as though the deadline

Actually, I don't think we do, not in the main Unit repository anyway...

to have a mergeable / testable branch is the end of this week.

There is nothing preventing you from having a branch in whoevers personal clone for testing purposes or even for actual use until a correct solution is merged upstream...

callahad commented 1 month ago

We have an internal opportunity with a constraint that requires landing functional (but not perfect) support for Transfer-Encoding: chunked in master this calendar month.

Rightly or wrongly, missing that will close the window on that particular opportunity.

hongzhidao commented 1 month ago

Can you say more about how this approach doesn't feel quite right to you?

Sure. Let me list some of the challenges.

About the pipeline process. Unit supports pipeline requests. For any request, it doesn't handle the request after it. Since for content-length body, we know how much to read based on the content-length size from the request header. But for chunked body, the size isn't known until reading the data in the request body and parsed successfully for each chunk. So to solve the problem, we need to check if we need to change the design of limiting not to read or process the next request for the current request. I'd prefer to keep the design, but it looks like chunked body might break it.

About the body for size that is larger than the size of the buffer. In Unit, if the request body is larger than the buffer which is 8k size, the body will be dumped into a temporary file. For content-length request, after reading request header, we've known if the body will be saved into a memory buffer or a file buffer. But for chunked body, we have to check it while reading and parsing the body. To write a patch to be ready to review, I feel we need to change the implementation of content-length body reading and make it general. NGINX does it this way.
Ideally, we should send the chunked body to application process for the chunked request. Unfortunately, the application process only process content-length body. It's not a small change to support it. The imperfect way is to transfer the chunked body to content-length body before sending to application process, and it means we need to receive the entire body in this way.
The buffer type for the request body. For the size that is less than buffer size, Unit uses a memory buffer to save it. But if the size is larger than buffer size, Unit uses a file buffer instead. To note this point since this design has a lot to do with the challenge.

Is the current approach good enough for that, or do you think this approach is so flawed that you would block merging?

The patch I posted last week is for the demo asked Thursday, and it didn't solve the above challenges.

We have an internal opportunity with a constraint that requires landing functional (but not perfect) support for Transfer-Encoding: chunked in master this calendar month.

I'll try my best to provide a patch that can work with application processes, but I'm afraid it's not ready to be reviewed or merged into upstream. It needs to take more development time.

ac000 commented 1 month ago

Hi @hongzhidao good job!

I see this is still marked as a draft, but I'll do a quick once over your latest patch while I notice some minor things...

One small other nit, perhaps you could reword the subject to something like

Support chunked request bodies

hongzhidao commented 1 month ago

One small other nit, perhaps you could reword the subject to something like Support chunked request bodies

Good suggestion, let me know if body needs to be replaced with bodies.

I see this is still marked as a draft

Yep, it's for the demo.

We need to land something ASAP, and should act as though the deadline to have a mergeable / testable branch is the end of this week.

Internally, I transferred the chunked request to content-length request since the application process can't handle chunked request now.

The design should be rewritten to make more sense. But it's not clear yet. Will do it after the demo.

ac000 commented 1 month ago

One small other nit, perhaps you could reword the subject to something like Support chunked request bodies

Good suggestion, let me know if body needs to be replaced with bodies.

Yes, I think the plural bodes makes more sense here...

I see this is still marked as a draft

Yep, it's for the demo.

We need to land something ASAP, and should act as though the deadline to have a mergeable / testable branch is the end of this week.

Internally, I transferred the chunked request to content-length request since the application process can't handle chunked request now.

The design should be rewritten to make more sense. But it's not clear yet. Will do it after the demo.

Sure...

hongzhidao commented 3 weeks ago

Hi,

Ideally, we should send the chunked body to application process for the chunked request. Unfortunately, the application process only process content-length body. It's not a small change to support it. The imperfect way is to transfer the chunked body to content-length body before sending to application process, and it means we need to receive the entire body in this way.

I'm trying to get the application process to parse chunked request body.

callahad commented 3 weeks ago

@hongzhidao For the first release of this, let's ignore sending chunked data to application processes. It is acceptable to buffer and transform from chunked bodies to content-length for now.

hongzhidao commented 2 weeks ago

Hi all, welcome to review, but the max_body_size check for the chunked body is not supported yet. I'd suggest checking with @andrey-zelenkov's tests to understand how it is used. https://github.com/nginx/unit/pull/1307

ac000 commented 2 weeks ago

Ignore the Fedora Rawhide failures, Rawhide seems broken at the moment...

ac000 commented 2 weeks ago

Hi @hongzhidao

You can add my

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>

to

http: Move chunked buffer pos pointer while parsing

hongzhidao commented 2 weeks ago

Hi @hongzhidao

You can add my

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>

to

http: Move chunked buffer pos pointer while parsing

Added, and I wonder if this means the review on the commit has been done when the Reviewed-by tag is asked to be added? If yes, I'll add the same tag on the other commits after they are reviewed.

ac000 commented 2 weeks ago

Why remove the link to the chunked issue?\

EDIT: Hmm, I guess because this won't be enabled by default...

ac000 commented 2 weeks ago

Hi @hongzhidao You can add my
Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
to
http: Move chunked buffer pos pointer while parsing
Added, and I wonder if this means the review on the commit has been done when the Reviewed-by tag is asked to be added? If yes, I'll add the same tag on the other commits after they are reviewed.

It means I have reviewed that particular patch. I'll notify you the same for the others...

hongzhidao commented 2 weeks ago

Why remove the link to the chunked issue?\

EDIT: Hmm, I guess because this won't be enabled by default...

I have no idea why it's removed, but I switched the PR status from draft to open.

ac000 commented 1 week ago

OK @hongzhidao good stuff!

I think you can add my

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>

to that last commit now...

nginx / unit

Support chunked request bodies #1262

Test