Stream finished does not always work with http incoming message

misos1 commented 3 years ago

Version: v15.12.0
Platform: Darwin Kernel Version 20.3.0
Subsystem: http

What steps will reproduce the bug?

The stream.finished never resolves or rejects when applied onto a destroyed incoming message like in example below. It finishes for example when applied on a destroyed file stream. Also it finishes when the line with await new Promise(r => setTimeout(r, 1000)); is commented. This looks really inconsistent.

let http = require("http");
let { finished } = require("stream/promises");

let server = http.createServer(async function(req, res)
{
    for await (let chunk of req) break;
    await new Promise(r => setTimeout(r, 1000));
    console.log("waiting");
    await finished(req);
    console.log("sending");
    res.end();
});

(async function()
{
    await new Promise(resolve => server.listen(resolve));
    let req = http.request({ port: server.address().port, method: "post" }).end("abc");
    try
    {
        let res = await new Promise((resolve, reject) => req.on("response", resolve).on("error", reject));
        await finished(res.resume());
    }
    catch(e)
    {
        console.log(e);
    }
}());

How often does it reproduce? Is there a required condition?

Always.

What is the expected behavior?

Error: socket hang up
waiting
sending

What do you see instead?

Error: socket hang up
waiting

Additional information

Linkgoron commented 3 years ago

This works correctly for me on Node 16.0+, but not on Node 15.0+.

Linkgoron commented 3 years ago

Changing finished to the unpromisified version, and just calling destroy on the request, this doesn't work correctly on Node 12.22.1 and 14.17 as well.

let http = require("http");
let { finished } = require("stream");

let server = http.createServer(async function(req, res)
{
    req.destroy()
    console.log("waiting");
        await new Promise(res => setTimeout(res,1));
    await new Promise((res) => finished(req, res));
    console.log("sending");
    res.end();
});

(async function()
{
    await new Promise(resolve => server.listen(resolve));
    let req = http.request({ port: server.address().port, method: "post" }).end("abc");
    try
    {
        let res = await new Promise((resolve, reject) => req.on("response", resolve).on("error", reject));
        await finished(res.resume());
    }
    catch(e)
    {
        console.log(e);
    }
}());

misos1 commented 3 years ago

Seems on nodejs 16 it is working correctly.

Linkgoron commented 3 years ago

Going over the code, I think that this was fixed by https://github.com/nodejs/node/pull/33035 as the above works correctly in 15.5 and doesn't work in 15.4 and 15.6 (as https://github.com/nodejs/node/pull/33035 was reverted).

mcollina commented 3 years ago

cc @nodejs/http @nodejs/streams

mcollina commented 3 years ago

FYI: Node v15 is going out of maintenance in a bit more than a month, it is unlikely to receive a fix for this.

A few notes:

do not use async/await in a event handler or a callback. This will lead to unintended results and broken behaviors.
don't write this code. If you call req.destroy(), you need to assume that the response has been destroyed as well: stop all operations for that request.
we do have bug somewhere, I'm not sure the fix is backportable.

@dnlup @ronag do you think it could be possible to fix this in v12 and v14 in a non-breaking way?

spazmodius commented 3 years ago

do not use async/await in a event handler or a callback

Wait, what? Can you point to some elaboration on this, please?

benjamingr commented 3 years ago

Wait, what? Can you point to some elaboration on this, please?

Instead of doing:

emitter.once('foo', async () => {
  // no one is awaiting this, some stuff has best effort cleanup
  await someOtherStuffOrCleanup();
});

vs.

await once(emitter, 'foo');
await someOtherStuffOrCleanup(); // now everything is listened for and is in sequence.

spazmodius commented 3 years ago

@benjamingr I understand your point about orphaning a Promise, but I'm missing how that relates to the examples leading up to @mcollina 's comment.

I surmise that the "event handler or callback" he was referring to is the createServer() callback. To "not use async/await" in that callback I can only interpret in these possible ways, neither of which is tenable:

The callback must call res.end() before it returns.
res.end() must be buried in more callbacks, not after await.

What am I not getting?

ronag commented 3 years ago

If you throw inside the async function it will become an unhandled exception. await finished can throw. Even it if might be fine in certain callbacks, it sometimes can cause weird ordering issues in edge cases and it's difficult to determine when this is a problem or not. It's better as a general rule to avoid async/await in callbacks.

misos1 commented 3 years ago

I could create an async generator from createServer. Btw is there a plan to have a promise/generator based http?

spazmodius commented 3 years ago

@ronag

If you throw inside the async function it will become an unhandled exception

I disagree. Throwing in an async function is obviously catchable.

Maybe you mean "throwing in an event handler will be unhandled"? But then, that is true whether it is sync or async.

mcollina commented 3 years ago

I mean literally: do not mix promises and callbacks. This is a good talk on the topic from @jasnell: https://youtu.be/XV-u_Ow47s0.

I would recommend you to use Koa, Hapi or Fastify (and others), as they provide a higher level API that is promise-friendly. There are a few tricky bits in using the low-level APIs with async await that you do not want to deal with.

The code you are proving is something that I would not recommend anybody to write. You are deliberately destroying something and then checking for it to be destroyed. Note that when you destroy the request you also destroy the response - there is no reason to continue.

misos1 commented 3 years ago

For the purpose of this example it is ok when it aborts and the program terminates in case finished throws. I used just builtin nodejs modules as is stated when posting bug report "Enter details about your bug, preferably a simple code snippet that can be run using node directly without installing third-party dependencies.". Btw finished behaves differently when an incoming message is destroyed during waiting it will throw an error (premature close) but not if it is already destroyed.

You are deliberately destroying something and then checking for it to be destroyed.

Yes it is a minimal test case. There are situations when in some part of code one does not know whether the incoming message is already destroyed or not. There is just need to wait until it is finished. If it was already destroyed then res.end will be no-op so I deem it unnecessary to check whether the request was destroyed before calling it. There may be other processing after finished among res.end.

dnlup commented 3 years ago

FYI: Node v15 is going out of maintenance in a bit more than a month, it is unlikely to receive a fix for this.

A few notes:

do not use async/await in a event handler or a callback. This will lead to unintended results and broken behaviors.

don't write this code. If you call req.destroy(), you need to assume that the response has been destroyed as well: stop all operations for that request.

we do have bug somewhere, I'm not sure the fix is backportable.

@dnlup @ronag do you think it could be possible to fix this in v12 and v14 in a non-breaking way?

I am not sure yet, I think the problem could be how the finished check is handled in streams or how the IncomingMessage is destroyed. I remember having to check those things when we merged the autodestroy feat in IncomingMessage, which should have been a non-breaking change but turned out not to be one.

mcollina commented 3 years ago

Yes it is a minimal test case. There are situations when in some part of code one does not know whether the incoming message is already destroyed or not. There is just need to wait until it is finished. If it was already destroyed then res.end will be no-op so I deem it unnecessary to check whether the request was destroyed before calling it. There may be other processing after finished among res.end.

This is exactly what I recommend not to do. Don't do anything after you have sent a response to the user (minus observability requirements/logging etc).

misos1 commented 3 years ago

This is exactly what I recommend not to do. Don't do anything after you have sent a response to the user (minus observability requirements/logging etc).

I think this is rather a question of practicality - maybe there are not many practical things which one could do after a response is sent but I do not see any reason why one could not do anything he/she wants. For example there could be an api call where the user sends some data and the server stores it in db and responds "ok" then starts some long lasting background job which does some processing of this data and there can be another api call which queries whether this job is done and retrieves results. Why would be this a bad idea?

mcollina commented 3 years ago

Having some background processing is ok. Having a race between two code paths to send an HTTP response is an anti-pattern and the main source of your problem. When moving the processing on the background, no reference to the req/res should be kept around.

bizob2828 commented 3 years ago

It might be worth noting that the apollo-server-hapi plugin exhibits this behavior in node 16. I'm not sure the cause but I do know that I was seeing readable streams(http requests) get destroyed during the execution of the hapi middleware chain. hapi has a listener on close which eventually just sets the request context to null. Not that this is a viable workaround but if I commented out this event handler in hapi core, it no longer crashed the hapi server.

You can see the issue if you run this code: https://github.com/bizob2828/apollo-hapi-nr-test

and make this curl request:

curl --location --request POST 'http://localhost:6000/graphql' \
--header 'Content-Type: application/json' \
--data-raw '{"query":"query GetRoids {\n  asteroids(date: \"2020-08-12\") {\n    id,\n    name,\n    miss_distance_km,\n    close_approach_date,\n    relative_velocity_km_per_hour,\n  }\n}","variables":{}}'

misos1 commented 3 years ago

Having a race between two code paths to send an HTTP response is an anti-pattern and the main source of your problem.

No I did not mean anything like that. It would not send another response after processing is done. I mentioned another different api call for retrieving status and result (if finished) of that background processing.

When moving the processing on the background, no reference to the req/res should be kept around.

Of course.

misos1 commented 3 years ago

So this is probably no longer true in nodejs 16: https://github.com/nodejs/node/blob/d798de1c653efa5ec0015d44f266db6dda197bf4/lib/_http_incoming.js#L184-L189

nodejs / node