tokio-rs / axum

Ergonomic and modular web framework built with Tokio, Tower, and Hyper
18.35k stars 1.03k forks source link

Weird behavior when returning response without consuming multipart body [REPRO] #2850

Open fs-99 opened 1 month ago

fs-99 commented 1 month ago

A bit more explanation and instructions on how to see what I mean directly can be found in this reproduction: https://github.com/fs-99/axum-multipart-bug-repro

The problem I'm facing appears when verifying size limits on very large file uploads with axum. The body/request size limit holds, but if the body is not buffered completely beforehand, the frontend does not receive a "413 Payload Too Large" error, just a "network error". This might be because closing a stream does not sit well with browsers, in which case this is a browser issue. Postman does not seem to have this problem, it rightfully receives a 413 from the server.

As Discussed in https://github.com/tokio-rs/axum/discussions/2445

Originally posted by **momozahara** December 25, 2023

(there was another one)

As Discussed in https://github.com/tokio-rs/axum/discussions/1650

CmdrMoozy commented 1 month ago

If it helps, I can post the code I wrote to work around the issue when I ran into it in #2445. It's a trivial function, feel free to use it as public domain / with no restrictions.

In my case I only cared about handling file uploads, I wasn't worried about enforcing size limits on other kinds of requests. So I ended up not using the 'built-in' axum middleware for enforcing size limits at all, and wrote the following upload handler:

async fn receive_upload(mut multipart: extract::Multipart) -> Result<Vec<Upload>> {
    let mut uploads = Vec::new();
    let mut err = None;
    let mut total_upload_bytes = 0;

    while let Some(mut field) = multipart.next_field().await? {
        let mut buf = BytesMut::new();
        while let Some(chunk) = field.chunk().await? {
            if total_upload_bytes + chunk.len() > MAX_UPLOAD_BYTES {
                err = Some(anyhow!(
                    "file upload too large, limit is {} bytes",
                    MAX_UPLOAD_BYTES
                ));
            }

            if err.is_some() {
                continue;
            }

            total_upload_bytes += chunk.len();
            buf.put(chunk);
        }

        if err.is_some() {
            continue;
        }

        uploads.push(Upload {
            field_name: field.name().map(|s| s.to_owned()),
            filename: field.file_name().map(|s| s.to_owned()),
            data: buf.freeze(),
        });
    }

    if let Some(err) = err {
        return Err(err);
    }

    Ok(uploads)
}

The basic idea is to go ahead and 'consume' all of the chunks / fields, even though we don't actually do anything with them once the error condition has been triggered. Then at the very end, send an error response if appropriate.

As discussed in #2445 this appeases (some?) browsers, and gives the right behavior. Other clients like cURL don't mind receiving a response before they are done sending their request, and so either approach works.

fs-99 commented 1 month ago

@CmdrMoozy thank you, that's a good workaround in "userspace". I experimented a bit and think I kinda did the same thing as a patch in limited.rs:

impl<B> Body for Limited<B>
where
    B: Body,
    B::Error: Into<Box<dyn Error + Send + Sync>>,
{
    type Data = B::Data;
    type Error = Box<dyn Error + Send + Sync>;

    fn poll_frame(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Result<Frame<Self::Data>, Self::Error>>> {
        let mut this = self.project();
        let res = match this.inner.as_mut().poll_frame(cx) {
            Poll::Pending => return Poll::Pending,
            Poll::Ready(None) => None,
            Poll::Ready(Some(Ok(frame))) => {
                if let Some(data) = frame.data_ref() {
                    if data.remaining() > *this.remaining {
                        *this.remaining = 0;
                        // >>>>>>>>> CHANGES START HERE <<<<<<<<<
                        // NOTE: returning Pending directly here does not work, since we need to error on Ready(None)
                        // println!("PATCH: polling remaining frames!");
                        match this.inner.as_mut().poll_frame(cx) {
                            Poll::Pending => {
                                print!("pending ");
                                return Poll::Pending
                            },
                            Poll::Ready(None) => {
                                println!("ready none ");
                                return Poll::Ready(Some(Err(LengthLimitError.into())))
                            }
                            // continue polling until None reached
                            // NOTE: this codepath was never triggered with my test file
                            Poll::Ready(Some(Ok(frame))) => {
                                print!("frame ");
                                return Poll::Pending;
                            },
                            // NOTE: this codepath was never triggered with my test file
                            Poll::Ready(Some(Err(err))) => {
                                let err = err.into();
                                println!("some err: {err}");
                                return Poll::Ready(Some(Err(err)))
                            },
                        }
                    } else {
                        *this.remaining -= data.remaining();
                        Some(Ok(frame))
                    }
                } else {
                    Some(Ok(frame))
                }
            }
            Poll::Ready(Some(Err(err))) => Some(Err(err.into())),
        };

        Poll::Ready(res)
    }

It just polls again the moment it finds the body to be larger, then repeats the cycle. It ends up a bunch of times in Pending and then goes to Ready(None) (which means "end of stream"). I'm trying to learn about this only by pattern matching with my brain and navigating this code a bit. There might be a cleaner way to do this, too.