withastro / astro

The web framework for content-driven websites. ⭐️ Star to support our work!
https://astro.build
Other
46.36k stars 2.46k forks source link

Performance bug when uploading large amounts of data #7525

Closed hbgl closed 1 year ago

hbgl commented 1 year ago

What version of astro are you using?

2.7.2

Are you using an SSR adapter? If so, which one?

Node

What package manager are you using?

npm

What operating system are you using?

Windows

What browser are you using?

Chrome

Describe the Bug

There exists a performance bug when Astro has to handle large request bodies. The problem is caused by this piece of code:

https://github.com/withastro/astro/blob/5df4853bd19cab85cd33cc006662669db815e7f8/packages/astro/src/core/app/node.ts#L64-L75

On line 67, Buffer.concat is called for every chunk of the request body. The concat function returns a newly allocated buffer on each call. This puts a lot of pressure on the GC and spikes memory usage. With small payloads this effect is not noticeable. However, with larger payloads in the megabytes, this adds significant overhead. The chunk size on my machine is 64k, so a lot of buffers are created and destroyed when uploading anything large.

I want my Astro app to handle file uploads so this is a real problem for me.

What's the expected result?

For me at least, the expected result would be to not buffer the body into memory at all. The Request constructor also accepts a ReadableStream for the body. That would be my preferred solution because for file uploads that will be saved on disk, there is no reason to load them into memory first.

Alternatively, instead of allocating a new Buffer for every chunk, use a "growing" buffer, e.g. double in size when full.

Link to Minimal Reproducible Example

https://stackblitz.com/edit/github-mx5ncy-g1ui5u

Participation

natemoo-re commented 1 year ago

Thanks for opening an issue! Since you've already identified the culprit, any interest in submitting a PR to fix this? Switching to a ReadableStream seems like a great solution.

hbgl commented 1 year ago

I'm not sure if I can find the time to do a proper implementation for this central infrastructure feature. It's sadly not that straight foward to convert a node Readable to a ReadableStream. A suitable function is avaiable since Node V17 as an experimental feature. Astro does support Node V16 so that would need to get polyfilled (node source).

natemoo-re commented 1 year ago

Totally understandable, I appreciate the links and context! We'll try to get working on this as soon as we can.

hbgl commented 1 year ago

It's sadly not that straight foward to convert a node Readable to a ReadableStream. A suitable function is avaiable since Node V17 as an experimental feature. Astro does support Node V16 so that would need to get polyfilled (node source).

@natemoo-re I want to make a correction here. Node uses the undici library for the implementation of the Request class (see node source). Undici's Request body also accepts async iterables, even though this is non-standard (see remark).

So in practice, you do not need an adapter to convert from a http.IncomingMessage to a ReadableStream since http.IncomingMessage is also a Readable which is also an aync iterable.

This should just work:

import { IncomingMessage } from 'node:http';

function makeRequest(req: IncomingMessage) {
    return new Request('http://example.com', {
        method: 'post',
        body: req,
        duplex: 'half',
    });
}
hbgl commented 1 year ago

I was trying to write a fix (see commit) but in order to make it PR-ready, it would also involve refactoring the tests. The tests currently use node-mocks-http to create mock requests. However, the mocked request does not support the async iterable interface that the real request has.