nodejs / undici

An HTTP/1.1 client, written from scratch for Node.js
https://nodejs.github.io/undici
MIT License
6.02k stars 527 forks source link

Undici's Fetch's Body takes a long time to load compared to just request #2014

Open cakedan opened 1 year ago

cakedan commented 1 year ago

Bug Description

I've noticed a bug where if the body of a fetch response is large (let us say above 50mb), grabbing the body contents takes a long time to load. I compared it to undici's normal request function and node-fetch's fetch response. In comparison, node-fetch took about 8.5 to 9.4 seconds to grab a 130mb file, undici's request function took 2.3 to 2.6 seconds, and undici's fetch took 95.5 to 104.2 seconds.

Reproducible By

undici fetch, 95.5 seconds, 18,605 buffers

await (async () => {
    const { fetch } = require('undici');

    const response = await fetch('https://alpha.notsobot.com/api/undici/test');
    let now = Date.now();
    const buffers = [];
    for await (let x of response.body) {
        buffers.push(x);
    }
    console.log(Date.now() - now);
    now = Date.now();
    JSON.parse(Buffer.concat(buffers));
    console.log(Date.now() - now);
})();

undici request, 2.6 seconds, 9794 buffers

await (async () => {
    const { request } = require('undici');

    const response = await request('https://alpha.notsobot.com/api/undici/test');
    let now = Date.now();
    const buffers = [];
    for await (let x of response.body) {
        buffers.push(x);
    }
    console.log(Date.now() - now);
    now = Date.now();
    JSON.parse(Buffer.concat(buffers));
    console.log(Date.now() - now);
})();

node-fetch fetch, 9.4 seconds, 28909 buffers

await (async () => {
    const fetch = require('node-fetch');
    const response = await fetch('https://alpha.notsobot.com/api/undici/test');
    let now = Date.now();
    const buffers = [];
    for await (let x of response.body) {
        buffers.push(x);
    }
    console.log(Date.now() - now);
    now = Date.now();
    JSON.parse(Buffer.concat(buffers));
    console.log(Date.now() - now);
})();

(Also, doing .arrayBuffer() yields the same results)

Expected Behavior

Similar timing to undici's request function

Logs & Screenshots

Environment

image

Ubuntu 22.10, Node v19.8.1 and v18.7.0, undici v5.2.0 and v5.21.0

Additional context

KhafraDev commented 1 year ago

This is unfortunately an issue with using webstreams vs. node streams. Node streams perform much better than webstream currently.

KhafraDev commented 1 year ago

If we don't decompress the response, it takes 4.4 seconds. If we do (using zlib.createBrotliDecompress()) it takes ~70 seconds.

KhafraDev commented 1 year ago

... and if we re-assign this.body to the result of pipeline, it fixes the issue, but causes issues with invalid gzipped/brotli compressed bodies.

@ronag you know much more about streams than I do, do you have any idea?

ronag commented 1 year ago

... and if we re-assign this.body to the result of pipeline, it fixes the issue, but causes issues with invalid gzipped/brotli compressed bodies.

Not sure I follow.

KhafraDev commented 1 year ago

So for example, this diff fixes the issue:

diff --git a/lib/fetch/index.js b/lib/fetch/index.js
index 0b2e3394..14e84b29 100644
--- a/lib/fetch/index.js
+++ b/lib/fetch/index.js
@@ -2023,7 +2023,7 @@ async function httpNetworkFetch (
             status,
             statusText,
             headersList: headers[kHeadersList],
-            body: decoders.length
+            body: this.body = decoders.length
               ? pipeline(this.body, ...decoders, () => { })
               : this.body.on('error', () => {})
           })

but then zlib/brotli tests fail (only tested on the node-fetch tests but it's safe to assume others will fail too)

ronag commented 1 year ago

I think it just removes the decompression...?

KhafraDev commented 1 year ago

yeah it does, but what I don't understand is why it's causing an issue here, but not with node-fetch. Node-fetch uses pipeline & zlib too.

ronag commented 1 year ago

Webstreams...

KhafraDev commented 1 year ago

I thought so too (made an issue in the performance repo), but considering that removing the decompression fixes the issue...?

ronag commented 1 year ago

Can be that web streams have smaller chunks which makes the decompression much less efficient... or something

ronag commented 1 year ago

I don't think this is something we can solve in undici.

KhafraDev commented 1 year ago

web streams have smaller chunks

No, I don't think so? In the OP node-fetch has 10k more chunks than undici.fetch does. Adjusting the highwatermark/size didn't make much difference if I'm remembering correctly

ronag commented 1 year ago

🤷‍♂️

KhafraDev commented 1 year ago

@cakedan do you have a repro that runs locally, without the external server? I can't seem to replicate the issue