nodejs / node

Node.js JavaScript runtime βœ¨πŸ’πŸš€βœ¨
https://nodejs.org
Other
107.66k stars 29.63k forks source link

filehandle.readableWebStream() chunks return incompatible `ArrayBuffer` instead of `Uint8Array` #54041

Open karlhorky opened 3 months ago

karlhorky commented 3 months ago

Version

v20.12.0

Platform

Linux ljysjk 6.1.43 #1 SMP PREEMPT_DYNAMIC Sun Aug  6 20:05:33 UTC 2023 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

  1. Open the CodeSandbox (see code below)
  2. Observe the ERR_INVALID_ARG_TYPE error below

CodeSandbox demo: https://codesandbox.io/p/devbox/filehandle-readablewebstream-with-new-response-ljysjk?file=%2Findex.js

import fs from "node:fs/promises";
import http from "node:http";
import path from "node:path";
import { Readable } from "node:stream";

const filePath = "./image.jpg";

const server = http.createServer(async (nodeRequest, nodeResponse) => {
  const stats = await fs.stat(filePath);
  const fileHandle = await fs.open(filePath);
  const webStream = fileHandle.readableWebStream();

  nodeResponse.writeHead(200, {
    "Content-Disposition": `attachment; filename=${path.basename(filePath)}`,
    "Content-Type": "image/jpeg",
    "Content-Length": String(stats.size),
  });

  const nodeStream = Readable.fromWeb(webStream);
  nodeStream.pipe(nodeResponse);
});

const port = 3000;
server.listen(port, () => {
  console.log(`Server running at http://localhost:${port}/`);
});

Error:

Server running at http://localhost:3000/
node:events:496
      throw er; // Unhandled 'error' event
      ^

TypeError [ERR_INVALID_ARG_TYPE]: The "chunk" argument must be of type string or an instance of Buffer or Uint8Array. Received an instance of ArrayBuffer
    at readableAddChunkPushByteMode (node:internal/streams/readable:480:28)
    at Readable.push (node:internal/streams/readable:390:5)
    at node:internal/webstreams/adapters:517:22
Emitted 'error' event on Readable instance at:
    at emitErrorNT (node:internal/streams/destroy:169:8)
    at emitErrorCloseNT (node:internal/streams/destroy:128:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'ERR_INVALID_ARG_TYPE'
}

Node.js v20.12.0
Failed running 'index.js'

Screenshot 2024-07-25 at 12 52 56

Alternative version with new Response():

import fs from "node:fs/promises";
import http from "node:http";
import path from "node:path";

const filePath = "./image.jpg";

const server = http.createServer(async (nodeRequest, nodeResponse) => {
  const stats = await fs.stat(filePath);
  const fileHandle = await fs.open(filePath);
  const webStream = fileHandle.readableWebStream();

  // Version with new Response()
  const webResponse = new Response(webStream, {
    status: 200,
    headers: new Headers({
      "content-disposition": `attachment; filename=${path.basename(filePath)}`,
      "content-type": "image/jpeg",
      "content-length": String(stats.size),
    }),
  });

  nodeResponse.writeHead(
    webResponse.status,
    Object.fromEntries(webResponse.headers.entries())
  );

  webResponse.body.pipeTo(
    new WritableStream({
      write(chunk) {
        nodeResponse.write(chunk);
      },
      close() {
        nodeResponse.end();
      },
    })
  );
});

const port = 3000;
server.listen(port, () => {
  console.log(`Server running at http://localhost:${port}/`);
});

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior? Why is that the expected behavior?

Web streams created by fileHandle.readableWebStream() should be compatible with Readable.fromWeb()

What do you see instead?

Web streams created by fileHandle.readableWebStream() should are incompatible with Readable.fromWeb()

Additional information

Also reported over here:

cc @jasnell

Ethan-Arrowood commented 3 months ago

Taking a look πŸ‘€

Ethan-Arrowood commented 3 months ago

My understanding is that the readableWebStream method returns a ReadableStream in bytes mode (https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/ReadableStream#type) by default because files can contain any kind of data. As the error suggests, ArrayBuffer is not directly compatible with Node.js streams.

The best solution I could recommend right now is to maybe use a TransformStream and convert each chunk of the ReadableStream into a Node.js Buffer using Buffer.from(chunk) (which should correctly handle ArrayBuffer).

Also, I assume the code you provided is just for reproduction purposes, but it's pretty inefficient to go from Node Stream to Web Stream back to Node stream. Keep it all as a Node stream, or convert it to Web stream and keep it as that.

Ethan-Arrowood commented 3 months ago

Roughly:

const t = new TransformStream({
    transform(chunk, controller) {
        controller.enqueue(Buffer.from(chunk))
    },
})

const nodeStream = Readable.fromWeb(webStream.pipeThrough(t));

πŸ€·β€β™‚οΈ

Ethan-Arrowood commented 3 months ago

Regarding an actual fix to Node.js, I'd say we have to improve Readable.fromWeb to infer this and maybe do this transform automatically. I don't immediately see an issue with that. WDYT @jasnell ?

karlhorky commented 3 months ago

My understanding is that the readableWebStream method returns a ReadableStream in bytes mode (developer.mozilla.org/en-US/docs/Web/API/ReadableStream/ReadableStream#type) by default because files can contain any kind of data.

I'd say we have to improve Readable.fromWeb to infer this and maybe do this transform automatically

Oh ok, interesting - or should there a better default for file.readableWebStream()?

Also, I assume the code you provided is just for reproduction purposes, but it's pretty inefficient to go from Node Stream to Web Stream back to Node stream. Keep it all as a Node stream, or convert it to Web stream and keep it as that.

Right, I can see how converting back and forth multiple times with this code would seem unrealistic or add overhead.

But this may not be so unrealistic if working with a Node.js framework that expects Web streams in user code - eg. Route Handlers in Next.js:

eric-burel/demo-readableWebStream in @eric-burel's demo repo

import fs from "fs/promises"
import path from "path"

export const dynamic = 'force-dynamic'

export async function GET(request: Request) {
    const filePath = path.resolve("./public/image.jpg")
    const stats = await fs.stat(filePath);
    const fileHandle = await fs.open(filePath)
    const stream = fileHandle.readableWebStream()
    return new Response(stream, {
        status: 200,
        headers: new Headers({
            "content-disposition": `attachment; filename=${path.basename(filePath)}`,
            "content-type": "image/jpeg",
            "content-length": stats.size + "",
        })
    })

...where in the background, there may be code like this to convert it to a Node.js stream:

Keep it all as a Node stream, or convert it to Web stream and keep it as that

If there was a mode / way of working to instead keep everything as a Web stream, and also serve up Web streams from a Node.js response with node:http or similar, then that would I guess be an option for frameworks to migrate to... πŸ€”

Ethan-Arrowood commented 3 months ago

Oh ok, interesting - or should there a better default for file.readableWebStream()?

Maybe, I'm trying to think through where to best "fix" this. Because both side's limitation make sense to me.

On one hand, you don't really want file.readableWebStream() to muck the underlying data. On the other, Readable.fromWeb should definitely handle this case. So the question comes down to how?

jasnell commented 3 months ago

FWIW, The readableWebStream() does not return a bytes stream by default... If you call const readable = fileHandle.readableWebStream({ type: 'bytes' }) then you'll get a proper byte-oriented stream that appears to work correctly with new Response(readable) ... I think maybe the change to make here is that readableWebStream() should return a byte-oriented stream by default.

Ethan-Arrowood commented 3 months ago

Ah I misunderstood the code here https://github.com/nodejs/node/blob/2d1b4a8cf7ce875f0c458941bb40ba6fbdccd42f/lib/internal/fs/promises.js#L284-L335

Do you foresee any issue with that change in default?

jasnell commented 3 months ago

I wouldn't imagine any issues but it would need to be a semvver-major change that should be called out in notable changes.

jasnell commented 3 months ago

@karlhorky ... can you verify quickly if const stream = fileHandle.readableWebStream({ type: 'bytes' }); works for your case?

karlhorky commented 3 months ago

Should be pretty easy to change the CodeSandbox linked above yep.

Although probably I'll need to check that out tomorrow

Ethan-Arrowood commented 3 months ago

The method is still labeled as experimental in the docs, so would it really be a major change? I think a fix is appropriate

karlhorky commented 3 months ago

@karlhorky ... can you verify quickly if const stream = fileHandle.readableWebStream({ type: 'bytes' }); works for your case?

Seems to work, yes! πŸ‘

CodeSandbox: https://codesandbox.io/p/devbox/filehandle-readablewebstream-with-new-response-forked-mszlgw?file=%2Findex.js

Changes to original sandbox from PR description:

  1. I forked the sandbox just now
  2. I added { type: 'bytes' } as an argument to fileHandle.readableWebStream()
  3. I removed the "Content-Disposition": `attachment; filename=${path.basename(filePath)}`, (which also worked to download the file by visiting the page directly, but didn't allow for a very nice demo)
  4. I added fileHandle.close(), to avoid these warnings:
    (node:6079) Warning: Closing file descriptor 23 on garbage collection
    (Use `node --trace-warnings ...` to show where the warning was created)
    (node:6079) [DEP0137] DeprecationWarning: Closing a FileHandle object on garbage collection is deprecated. Please close FileHandle objects explicitly using FileHandle.prototype.close(). In the future, an error will be thrown if a file descriptor is closed during garbage collection.

Screenshot 2024-07-27 at 13 05 19

cc @eric-burel

eric-burel commented 3 months ago

@karlhorky absolutely wonderful, I'll update my resources on the topic to match this API. Last tiny issue, I hit some TypeScript issues in the context of Next.js:

Argument of type 'ReadableStream<any>' is not assignable to parameter of type 'BodyInit | null | undefined'.
  Type 'import("stream/web").ReadableStream<any>' is not assignable to type 'ReadableStream<any>'.
    Types of property 'pipeThrough' are incompatible.

When passing the stream to Response. But maybe an issue on Next.js side.

Note that you can't close a file after the Response is sent in Next.js, so no way to properly close the file (closing it before the stream is sent will trigger an error as expected).

Ethan-Arrowood commented 3 months ago

@eric-burel that is an issue on the Next.js side. I believe it is a mismatch between Node.js Web Stream types and the Web Stream types from TypeScript's standard library (lib).

isker commented 3 weeks ago

@jasnell

I think maybe the change to make here is that readableWebStream() should return a byte-oriented stream by default.

I'm looking into this problem and I'm not sure why the stream should ever be allowed to not be byte-oriented. It's not clear to me why this should be an option in the API at all.

For example, as far as I know, the Node stream APIs in fs only deal in bytes (unless you use encoding to get strings; but this is done with something like TextDecoderStream in web streams).

karlhorky commented 2 days ago

@jasnell @Ethan-Arrowood PR has been opened by @isker over here: