nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.98k stars 29.79k forks source link

Fetch sends incorrect number of chunks when given a ReadableStream with a Passthrough as body #55910

Open j-nolan opened 1 week ago

j-nolan commented 1 week ago

Version

Repro confirmed on at least v23.2.0, v20.14.0

Platform

Repro both on Linux and Windows

Linux <redacted> 6.8.0-48-generic #48~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct  7 11:24:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Microsoft Windows NT 10.0.22631.0 x64

Subsystem

No response

What steps will reproduce the bug?

Run a basic HTTP server supporting HTTP chunked requests (python3 server.py):

# server.py

from http.server import HTTPServer, BaseHTTPRequestHandler
import os

class ChunkedHTTPRequestHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        if self.headers.get('Content-Type') != 'text/plain':
            self.send_error(400, "Invalid Content-Type. Expected 'text/plain'")
            return

        filename = 'uploaded-file.txt'
        with open(filename, 'wb') as f:
            while True:
                chunk_size_line = self.rfile.readline().strip()
                chunk_size = int(chunk_size_line, 16)

                if chunk_size == 0:
                    break

                chunk = self.rfile.read(chunk_size)
                f.write(chunk)
                print(f"Wrote chunk of length {chunk_size}")

                # Read and discard the CRLF at the end of the chunk
                self.rfile.read(2)

        # Send response
        self.send_response(200)
        self.end_headers()
        self.wfile.write(f"File '{filename}' uploaded successfully".encode('utf-8'))

def run_server(port=8000):
    server_address = ('', port)
    httpd = HTTPServer(server_address, ChunkedHTTPRequestHandler)
    print(f"Server running on port {port}")
    httpd.serve_forever()

if __name__ == '__main__':
    run_server()

Write a long text in my-file.txt. The file must be large enough to be chunkable (~0.5MB). (Here is a sample file for your convenience).

Run this NodeJS code (node index.js):

// index.js

const fs = require("fs");
const { PassThrough } = require("stream");

async function run() {
  const readStream = fs.createReadStream("my-file.txt"); // you will need a file large enough (>0.5MB)

  const passThrough = new PassThrough();

  passThrough.on("data", (e) => {
    console.log("chunk length", e.length);
  });

  readStream.pipe(passThrough);

  const url = `http://localhost:8000`;
  const response = await fetch(url, {
    method: "POST",
    headers: {
      "Content-Type": "text/plain",
    },
    body: passThrough,
    duplex: "half",
  });

  console.log("response status:", response.status);
}

run();

How often does it reproduce? Is there a required condition?

This only happens when the ReadableStream passed as a body to the fetch function is piped into a Passthrough.

What is the expected behavior? Why is that the expected behavior?

The number of chunks read by the Pipethrough is identical to the number of chunks received by the HTTP server.

What do you see instead?

The server consistently receives fewer chunks than were sent from the client:

Client (7 chunks):

> node index.js
chunk length 65536                      
chunk length 65536                      
chunk length 65536                      
chunk length 65536                      
chunk length 65536                      
chunk length 65536                      
chunk length 19566                      
response status: 200

Server (only 5 chunks):

127.0.0.1 - - [18/Nov/2024 22:05:42] "POST / HTTP/1.1" 200 -
Wrote chunk of length 65536
Wrote chunk of length 65536
Wrote chunk of length 65536
Wrote chunk of length 65536
Wrote chunk of length 19566

The number of chunks effectively sent varies and seems somewhat random (sometimes 4, sometimes 5).

The issue does not occur when the ReadableStream is not piped through Pipethrough. This implies that the Pipethrough somehow impacts how fetch consumes the ReadableStream.

Additional information

Apologies if I'm missing something. As far as I can tell, this is not the expected behavior of Pipethrough

j-nolan commented 1 week ago

Update: my theory is that something goes wrong in ReadableStreamFrom(), which is called because a Passthrough is not an instanceof ReadableStream. For some reason, this function drops the 2-3 first chunks. If I wrap the Passthrough into a ReadableStream.from(...), the problem disappears.

This is just a workaround for now. I'm trying to understand the issue with ReadableStreamFrom().

// index.js

const fs = require("fs");
const { PassThrough } = require("stream");

async function run() {
  const readStream = fs.createReadStream("my-file.txt"); // you will need a file large enough (>0.5MB)

  const passThrough = new PassThrough();

  passThrough.on("data", (e) => {
    console.log("chunk length", e.length);
  });

  readStream.pipe(passThrough);

  const url = `http://localhost:8000`;
  const response = await fetch(url, {
    method: "POST",
    headers: {
      "Content-Type": "text/plain",
    },
    body: ReadableStream.from(passThrough), // this is the workaround
    duplex: "half",
  });

  console.log("response status:", response.status);
}

run();
j-nolan commented 1 week ago

Update: The issue was not in ReadableStreamFrom()

After more digging, here is the problematic sequence:

  1. const readStream = fs.createReadStream("my-file.txt"); creates a ReadableStream in pause mode
  2. passThrough.on("data", ...) switches the Passthrough to flowing mode
  3. readStream.pipe(passThrough); switches readStream to flowing mode, because it is in flowing mode itself
  4. At this point, chunks start to flow, but are not read by anything, effectively discarding them
  5. By the time fetch reads from passThrough, many ticks later, the first few chunks have been missed

A better work-around than suggested above is therefore to ensure passThrough is not in flowing mode when readStream is piped to it. This can be done with pause():

  const passThrough = new PassThrough();

  passThrough.on("data", (e) => {
    console.log("chunk length", e.length);
  });

  passThrough.pause();

  readStream.pipe(passThrough);

Full fixed snippet:

const fs = require("fs");
const { PassThrough } = require("stream");

async function run() {
  const readStream = fs.createReadStream("my-file.txt");
  const passThrough = new PassThrough();

  passThrough.on("data", (e) => {
    console.log("chunk length", e.length);
  });

  passThrough.pause();
  readStream.pipe(passThrough);

  const url = `http://localhost:8000`;
  const response = await fetch(url, {
    method: "POST",
    headers: {
      "Content-Type": "text/plain",
    },
    body: passThrough,
    duplex: "half",
  });

  console.log("response status:", response.status);
}

run();

I think this issue can be closed, but I'll leave it open for a couple days in case someone is willing to proof-check my reasoning.