nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.97k stars 29.22k forks source link

http: client inconsistent behavior with content length header #35958

Closed delvedor closed 1 year ago

delvedor commented 3 years ago

Hey folks! I've encountered a weird bug. Every now and then it happens that the underlying connection between the client and the server breaks (not related to node, but to the infrastructure) while collecting the body. The easiest way to detect this is by checking the content-length header, as (as far as I know) there is no event we can listen to for detecting the broken connection while reading the body stream. While testing the behavior of the client with the content-length header but with a body with an incorrect length, I've noticed that there is a big difference in how this is handled based on the node version. Following you can find the snippet for reproducing the issue and the behavior I've encountered based on the node version, the presence of the http agent, and the content-length header.

What steps will reproduce the bug?

The code for reproducing the bug can be found in the following gist: https://gist.github.com/delvedor/0d15a34259fd2d2a21055f6e82a67380#file-http-client-bug-js

How often does it reproduce? Is there a required condition?

Consistently based on the versions of Node.

What is the expected behavior?

I'm not really sure what should be the right behavior, but I would expect:

What do you see instead?

The behavior changes between node versions. You can find the order of the events in the same gist: https://gist.github.com/delvedor/0d15a34259fd2d2a21055f6e82a67380#file-events-md

/cc @mcollina @ronag

ronag commented 3 years ago

I think an error/aborted event should be emitted and no end event.

Sirius79 commented 3 years ago

Hi @ronag would this be a good first issue to work on?

ronag commented 3 years ago

No, I’m unsure if and what the problem here is. Behavior seems ok in v15 and fixing earlier versions is probably semver major

delvedor commented 3 years ago

@ronag I think at least node v14 should be fixed and aligned to node v15, at the moment (in node v14) if the request has a content-length header bigger than the actual body, the response never ends.

mcollina commented 3 years ago

@delvedor could you please articulate the difference between the various Node.js release lines?

ronag commented 3 years ago

@mcollina I think it's pretty well described in https://gist.github.com/delvedor/0d15a34259fd2d2a21055f6e82a67380#file-events-md

mcollina commented 3 years ago

I think v14 should align either with v12 or v15.

delvedor commented 3 years ago

I don't know what's the difference between v12 and v14 there, but I think it would be better to align v14 to v15, as v14 will stick around for a while. Better to adopt the right/new behavior sooner rather than later. What do you think?

mmomtchev commented 3 years ago

I think an error/aborted event should be emitted and no end event.

@delvedor how does the connection 'break'? Sudden loss of connectivity that ends with a timeout? This should be an 'error' event.

delvedor commented 3 years ago

how does the connection 'break'?

I don't have an answer to this, you can see the discussion that originated this issue here.

This should be an 'error' event.

You will get an error event if there is a connectivity error as soon as you try to send the request, but if the connection breaks while reading the body, no error event is emitted.

A way of reproducing this scenario was to set a content-length header bigger than the body, which causes a different behavior based on the version of node.

mmomtchev commented 3 years ago

Maybe there is really a an issue with the content length, I will try digging, but I don't think it is a reliable way to verify that the HTTP request was successful - absence of 'error' should be the right way to do it And there should definitely be an 'error' event if your connection is reset How are you getting that ECONNRESET if it is not with an 'error' event? For me the absence of 'error' event when the connection was reset is the far more serious problem

mmomtchev commented 3 years ago

@delvedor, I just went through your gist, I confirm that you get different behavior from different versions of Node - which is not very good - but this is the very obscure case of a server that responds with a content-length that is longer than the actual body and then proceeds to cleanly close the connection - IMHO this is not a real-world scenario, it is more like an HTTP compliance test. I don't think this is what actually happens in your case The ECONNRESET you see here is an artificial ECONNRESET sent by the Node HTTP client to indicate a premature close - it is not a real network error

mcollina commented 3 years ago

@delvedor, I just went through your gist, I confirm that you get different behavior from different versions of Node - which is not very good - but this is the very obscure case of a server that responds with a content-length that is longer than the actual body and then proceeds to cleanly close the connection - IMHO this is not a real-world scenario, it is more like an HTTP compliance test.

I have seen a few situation where this actually happens. Most of those cases are bugs either in the server or a proxy middleman that for some reason truncates the response. You are right that it is about HTTP compliance and somewhat an edge case to consider. Nevertheless, I think the behavior of v14 regarding to this is not correct and we should do something about it.

@ronag is there a PR of yours that would need to be backported? Do you know which ones of the PRs changed this, as we can also revert that.

mmomtchev commented 3 years ago

I agree that Node's event behavior should be consistent across versions, but I wonder if this gist is representative of his real problem. @delvedor are you positive that you get a truncated body with no network error? This is a serious issue, far more serious than Node's inconsistent behavior on invalid content-length. Relying on the content-length is an ugly kludge at best. From what I understand, this happens on GCP, so you cannot run network sniffers? Is this HTTP server accessible from the outside world and can you try running your HTTP client outside GCP so that you can trace what happens on the network level?

That last message on your discussion mentions an ECONNRESET on read - this is a real networking issue and it is not the same as the one produced by your gist. And I see a JS stack trace - so you must be getting an 'error' event somewhere.

mmomtchev commented 3 years ago

@delvedor, check https://github.com/nodejs/node/issues/35824 - @lpinca 's remark and my last comment, maybe this is the 'error' you are getting

mmomtchev commented 3 years ago

@mcollina @delvedor @ronag Node 12 and Node 14 both emit an 'aborted' message on the response object in this case - normal close before the end of the HTTP response - which is not considered an error and won't have any effect if it is not explicitly handled - but can be listened to (just make sure you are listening on the response object) Node 15 will also emit an 'error' - and only if there is an active handler - in order to be compatible with all the existing user code that was not handling this error https://github.com/nodejs/node/blob/c575aec2a7da78dee7326ea37c93830716292b37/lib/_http_client.js#L429 This is all that needs changing if you want to reproduce Node 15 behavior

ronag commented 3 years ago

ah yes, @delvedor you need to listen to res.'aborted' in pre node v15. This might actually not be a problem then.

mmomtchev commented 3 years ago

The only case that still needs some work is the 'keepAlive: true` case, but this is a nightmarish case - just look at the situation - the remote server announces a content-length, then sends some of the data, then stops without closing the connection and then keeps it alive with no errors. This is currently handled in all versions (ie Node recovers) but the behavior is not very consistent across different versions. My opinion is that the benefit of consistency doesn't justify the risk of juggling around all that legacy code. @delvedor just one tip, there are network proxies that can simulate packet errors on the wire.

delvedor commented 3 years ago

I was not aware of the response.on('aborted') event, from my test, it looks that solves this issue very nicely in any Node.js version (from 10 to 15). @ronag in Node v15 the 'aborted' event gets emitted in any case :)

I'll do some additional tests, before confirming it, but at this point, I'm positive the 'aborted' is what I need, thanks!

delvedor commented 3 years ago

Do you recommend to call incomingMessage.destroy in case of the aborted event? For example:

incomingMessage.on('aborted', () => {
  incomingMessage.destroy()
  handleError(new Error('kaboom'))
})
mmomtchev commented 3 years ago

Just be aware of https://github.com/nodejs/node/issues/35923 which is probably staying like this. What is your HTTP server software? Is it Node or is it something else?

delvedor commented 3 years ago

@mmomtchev In this case Elasticsearch, but the error happens when there is a lot of infrastructure the middle.

Thanks for the linked issue. I'm not sure if I should call or not response.destroy (without an error, I'm handling the aborted event in a different way). In my test there is no difference if I call it or not, so I would say that I can, just to be sure that everything will be cleaned up.

mmomtchev commented 3 years ago

@delvedor I prefer to leave it to @ronag or @mcollina for the official way to do it

I was asking for your infrastructure, because we just found an ECONNRESET problem with Node servers running on Windows Try sniffing the network to see what is happening, or if this is not possible, try to locally induce ECONNRESET errors - you can do it by fiddling with iptables or pf and there are specifically designed network proxies

marco-ippolito commented 1 year ago

this was probably solved with https://github.com/nodejs/node/pull/46601, feel free to reopen if it's not the case