nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.57k stars 29.58k forks source link

`Readable.fromWeb` is never finished #49233

Closed jan-tosovsky-cz closed 1 month ago

jan-tosovsky-cz commented 1 year ago

Version

v18.17.1

Platform

Win64

Subsystem

No response

What steps will reproduce the bug?

Run this script:

const fs = require("fs")
const path = require("path")
const { Readable } = require("stream")
const { finished } = require("stream/promises")

async function saveURLToFile(url, path) {
    try {
        const response = await fetch(url)
        const fileStream = fs.createWriteStream(path)
        await finished(Readable.fromWeb(response.body).pipe(fileStream))

    } catch (err) {
        console.log(err)
    }
}

saveURLToFile("http://www.milicovice.cz/", path.resolve(__dirname, "output.html"))

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior? Why is that the expected behavior?

The content (HTML page) is stored to the file.

What do you see instead?

The node crashes.

Additional information

This method works for other sites. It is unclear why this one is problematic. That error is apparently uncaught by the catch block. This site belongs to the municipality of a small village. Please minimize the traffic (during testing) to a minimum. Thanks.

bnoordhuis commented 1 year ago

Define "crashes"? I'd like to see the stack trace or whatever you get.

jan-tosovsky-cz commented 1 year ago

Node.js exits during executing this line: await finished(Readable.fromWeb(response.body).pipe(fileStream)) No exception is thrown, hence it is not caught. When trying to capture the error this way, that err.out stays empty: node script.js 1> log.out 2> err.out However, when looking into Windows Event Viewer, no crash was reported for Node.js, so it may not be a typical app crash.

So I think that site has something unexpected (SSL certificate, server settings, ...) which triggers an exception not handled correctly and node.js exits abruptly.

Is there any option to enable some node.js logging to inspect what is going on?

bnoordhuis commented 1 year ago

Ah, okay. I think your report is a duplicate of #48466 and one or two other similar reports (that I can't find right now but I remember commenting on them. They're in the bug tracker... somewhere.)

If you agree it's the same bug, then please go ahead and close this issue.

jan-tosovsky-cz commented 1 year ago

The linked issue, if I understand correctly, causes an exiting the Node.js when the stream is consumed, but it is interrupted before calling close() method. Ok, this should be fixed. But there is another issue here. Why that stream interruption is not detected and the corresponding exception thrown, so it could be processed in my catch block?

When accessing different URLs I can see many errors (certificate problems, timeouts) so I would expect to handle such interruption events in the same way.

From the linked issue I deduce here the server closes the connection before sending the response. If so, it should be handled gracefully. But I can be wrong. I am not an experienced Node.js user to be able to dig into it more deeply.

aduh95 commented 1 year ago

Why that stream interruption is not detected and the corresponding exception thrown, so it could be processed in my catch block?

Most likely because the promise returned by finished is never rejected (nor resolved), instead the process exits because nothing is keeping it alive. If you were using ESM (a .mjs file), you could use top-level await, and you would see an exit code of 13.

ESM version ```js import fs from "node:fs"; import { Readable } from "node:stream"; import { finished } from "node:stream/promises"; async function saveURLToFile(url, path) { try { const response = await fetch(url); const fileStream = fs.createWriteStream(path); await finished(Readable.fromWeb(response.body).pipe(fileStream)); } catch (err) { console.log(err); } } await saveURLToFile( "http://www.milicovice.cz/", new URL("./output.html", import.meta.url) ); ``` ```console $ node entry.mjs $ echo $? 13 ```
aduh95 commented 1 year ago

I've rename the issue because there's no crash.

@nodejs/whatwg-stream can you comment wether the current behavior is expected or it should be fixed?

mcollina commented 1 year ago

I'm not sure what is the actual issue. The following code works correctly on my machine (it prints finished):

const fs = require("fs")
const path = require("path")
const { Readable } = require("stream")
const { finished } = require("stream/promises")

async function saveURLToFile(url, path) {
    try {
        const response = await fetch(url)
        const fileStream = fs.createWriteStream(path)
        await finished(Readable.fromWeb(response.body).pipe(fileStream))
        console.log('finished')
    } catch (err) {
        console.log(err)
    }
}

saveURLToFile("http://www.milicovice.cz/", path.resolve(__dirname, "output.html"))

@nodejs/undici I think I saw this problem already but I can't find an issue/pr.

jan-tosovsky-cz commented 1 year ago

@mcollina That code still doesn't work on my laptop in Nodejs v18.17.1 on Win64 Pro (10.0.22621 Build 22621). But now this stacktrace is printed:

C:\Program Files\nodejs\node.exe .\save-url-to-file.js
Uncaught TypeError TypeError: terminated
    at onAborted (internal/deps/undici/undici:11442:53)
    at emit (events:526:35)
    at terminate (internal/deps/undici/undici:10695:14)
    at fetchParams.controller.resume (internal/deps/undici/undici:11421:36)
    at processTicksAndRejections (internal/process/task_queues:95:5)
    --- TickObject ---
    at init (internal/inspector_async_hook:25:19)
    at emitInitNative (internal/async_hooks:202:43)
    at emitInitScript (internal/async_hooks:504:3)
    at nextTick (internal/process/task_queues:132:5)
    at onDestroy (internal/streams/destroy:103:15)
    at done (internal/webstreams/adapters:526:11)
    at destroy (internal/webstreams/adapters:544:7)
    at _destroy (internal/streams/destroy:109:10)
    at destroy (internal/streams/destroy:71:5)
    at destroyer (internal/streams/destroy:319:12)
    at <anonymous> (internal/webstreams/adapters:555:7)
    at processTicksAndRejections (internal/process/task_queues:95:5)
    --- Promise.then ---
    at newStreamReadableFromReadableStream (internal/webstreams/adapters:548:3)
    at Readable.fromWeb (internal/streams/readable:1403:27)
    at saveURLToFile (c:\test\save-url-to-file.js:10:33)
    at processTicksAndRejections (internal/process/task_queues:95:5)
    --- await ---
    at <anonymous> (c:\test\save-url-to-file.js:17:1)
    at Module._compile (internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (internal/modules/cjs/loader:1310:10)
    at Module.load (internal/modules/cjs/loader:1119:32)
    at Module._load (internal/modules/cjs/loader:960:12)
    at executeUserEntryPoint (internal/modules/run_main:81:12)
    at <anonymous> (internal/main/run_main_module:23:47)
Process exited with code 1
jan-tosovsky-cz commented 1 year ago

Exactly the same stracktrace I am getting also when saving the content for http://www.hornimarsov.cz/ so this issue is not isolated for the above single server URL.

mcollina commented 1 year ago

Something that would be really useful to debug this is to have a complete reproduction, i.e. a local Node.js server that can trigger this problem in the client.

The bug might even be Windows specific.

metcoder95 commented 1 year ago

https://github.com/orgs/nodejs/teams/undici I think I saw this problem already but I can't find an issue/pr.

Couldn't find it, the closest one was https://github.com/nodejs/undici/issues/2026

aduh95 commented 1 year ago

FWIW when I run the snippet in https://github.com/nodejs/node/issues/49233#issuecomment-1704984163 using the latest main, I am getting exit code 13 and 'finished' is never logged. Or sometimes I'm getting an error, it seems to be flaky:

node:events:492
      throw er; // Unhandled 'error' event
      ^

TypeError: terminated
    at Fetch.onAborted (node:internal/deps/undici/undici:11463:53)
    at Fetch.emit (node:events:526:35)
    at Fetch.terminate (node:internal/deps/undici/undici:10717:14)
    at fetchParams.controller.resume (node:internal/deps/undici/undici:11442:36)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Emitted 'error' event on Readable instance at:
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  [cause]: Error: incorrect header check
      at Zlib.zlibOnError [as onerror] (node:zlib:189:17) {
    errno: -3,
    code: 'Z_DATA_ERROR'
  }
}

Node.js v21.0.0-pre
mcollina commented 1 year ago

Funny insight: right now I can reproduce your results @aduh95.

My understanding is that our decompression algorithm is failing to decompress that payload: this is likely some kind of incompatibility with the meaning of deflate compression. Then because promises are resolved in the next tick and there is nothing to keep that event loop busy, the value or error are never emitted.

jan-tosovsky-cz commented 1 year ago

Wappalyzer plugin + Dev Tools reports this set of server details

http://www.milicovice.cz/ Kentico CMS (written in C#) powered by ASP.NET 4.0.30319 running on IIS 10.0 on Windows Server. The response is compressed using deflate. Keep-alive is not specified.

http://www.hornimarsov.cz/ PHP 5.5.38 running on Apache, encoded using gzip. Keep-alive timeout=5, max=100.

Each server's root cause will most likely differ as server configurations differ significantly.

Something about compression on IIS is here https://learn.microsoft.com/en-us/iis/extensions/iis-compression/iis-compression-overview There is stated two libs are available for the deflate compression, gzip.dll and iiszlib.dll. The second is not OOTB, but is more efficient and has better dynamic compression support. But it could explain incompatibilities.

mcollina commented 1 year ago

I think this is due to https://github.com/nodejs/undici/issues/2128.

huseyinacacak-janea commented 1 month ago

I've investigated this issue with Node v18.20.2, v20.16.0 and v22.9.0. Both v20 and v22 worked as expected. @jan-tosovsky-cz Do you think this can be closed now?

RedYetiDev commented 1 month ago

Per the comment above. See https://github.com/nodejs/undici/pull/2650

jan-tosovsky-cz commented 1 month ago

Thanks a lot for managing the fix. I downloaded 20.17.0 LTS (Win MSI installer) and ran the script again. The task is now completed without any issues.

Making this functionality more robust (i.e. supporting even obscure old sites) will eliminate headaches for many future developers. Kudos to all!