nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.12k stars 28.87k forks source link

Implement parallel Happy Eyeballs as described in RFC 8305 #48145

Open tniessen opened 1 year ago

tniessen commented 1 year ago

What is the problem this feature will solve?

autoSelectFamily is enabled by default in Node.js 20. However, it does not properly implement parallel connection attempts, which are an integral part of the Happy Eyeballs algorithm. This can cause timeouts to occur that did not occur in versions prior to Node.js 20. See, for example, this discussion.

What is the feature you are proposing to solve the problem?

Implement the algorithm as described in Section 5 of RFC 8305:

Starting a new connection attempt does not affect previous attempts, as multiple connection attempts may occur in parallel.

What alternatives have you considered?

Disabling autoSelectFamily and restoring the pre-20 behavior, see https://github.com/nodejs/node/issues/47644#issuecomment-1554440759.

ShogunPanda commented 1 year ago

I originally went for the proper parallel approach as states in RFC8305. I chose not to implement it as I had issues with the code.

Consider that implementing proper RFC8305 will imply that all of sudden Node.js starts opening multiple connections instead of one in case of multiple IP returned by DNS, which is dangerous.

The unwanted timeouts are not properly handled in https://github.com/nodejs/node/pull/47860, which will be available in Node.js 20 once I fix https://github.com/nodejs/node/issues/48000 and https://github.com/nodejs/node/issues/47822.

We can of course disable the feature completely by default, but it was my understanding that this was longly awaited to help beginners struggling with poorly configured dual stack networks.

tniessen commented 1 year ago

Consider that implementing proper RFC8305 will imply that all of sudden Node.js starts opening multiple connections instead of one in case of multiple IP returned by DNS, which is dangerous.

Is that a fact? Is there absolutely no way to prevent responding to more than a single address with a TCP ACK packet after having received TCP SYN-ACK?

ShogunPanda commented 1 year ago

Nope, we don't control the connection at that level, AFAIK.

@nodejs/libuv Am I wrong?

silverwind commented 1 year ago

I think it should be considered disabling this autoSelectFamily by default. It's breaking stuff all over the place. Stabilize first behind a --experimental-auto-select-family or something, but keep this broken stuff out of the upcoming LTS please.

ShogunPanda commented 1 year ago

After 20.3.0 all issues should be fixed. Can we please wait for this release before rushing into decisions?

silverwind commented 1 year ago

The ENETUNREACH issue mentioned here is still unresolved, right? As I understand it, it can only be fixed with parallel connection attempts.

ShogunPanda commented 1 year ago

Not really. We can add a new feature to opt-out A or AAAA in the list of addresses to try when the machine has no IPv4 and IPv6 connectivity respectively. Note that ENETUNREACH will happen even in previous versions if the DNS only returns AAAA records.

Additionally, to mitigate the problem, a simple fix would be to raise the minimum attempt timeout to 500ms to reduce the possibility that IPv4 fails.

HinataKah0 commented 1 year ago

I was interested so I tried reading the RFC as well. Just giving my 2 cents of what I understand since I saw the discussion...

TBH I was also confused at first how you can make both "non-simultaneous" and "may occur in parallel" connections. The first paragraph seems conflicting. 😅

But the key is in 2nd paragraph:

A simple implementation can have a fixed delay for how long to wait before starting the next connection attempt.

So, it's something like this: image

Which is "non-simultaneous" but also "may occur in parallel". An alternative/a more sophisticated delay is by using exponential backoff with min & max delay (described in paragraph 3).

bnoordhuis commented 1 year ago

Is there absolutely no way to prevent responding to more than a single address with a TCP ACK packet after having received TCP SYN-ACK?

Not in a portable fashion. I'm not even sure you could do it reliably if all you cared about was Linux.

The only way I'm aware of is polling getsockopt(TCP_INFO) and checking the tcp_info.tcpi_state field but that's a) timing sensitive, and b) not a viable approach for an asynchronous runtime like Node.js.

ShogunPanda commented 1 year ago

@HinataKah0 Yes, I interpreted in that way as well. But, as @bnoordhuis said, since we cannot really control the socket once is started, I chose to implement in completely non-overlapped way so that we try to avoid issues and race conditions.

github-actions[bot] commented 8 months ago

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

tniessen commented 1 month ago

On a side note, the implementation in Node.js 20.13.1 also appears to break some node-gyp builds that now fail to download headers from our very own domain nodejs.org (https://github.com/tniessen/node-pqclean/issues/3#issuecomment-2219921250):

gyp http GET https://nodejs.org/download/release/v20.13.1/node-v20.13.1-headers.tar.gz
gyp http fetch GET https://nodejs.org/download/release/v20.13.1/node-v20.13.1-headers.tar.gz attempt 1 failed with ETIMEDOUT
gyp WARN install got an error, rolling back install
gyp ERR! configure error
gyp ERR! stack FetchError: request to https://nodejs.org/download/release/v20.13.1/node-v20.13.1-headers.tar.gz failed, reason:
gyp ERR! stack at ClientRequest.<anonymous> (/usr/lib/node_modules/npm/node_modules/minipass-fetch/lib/index.js:130:14)
gyp ERR! stack at ClientRequest.emit (node:events:519:28)
gyp ERR! stack at _destroy (node:_http_client:880:13)
gyp ERR! stack at onSocketNT (node:_http_client:900:5)
gyp ERR! stack at process.processTicksAndRejections (node:internal/process/task_queues:83:21)

Setting --no-enable-network-family-autoselection seems to fix node-gyp builds.

ShogunPanda commented 1 month ago

Rather than disabling the Happy Eyeballs altogether, you could use --network-family-autoselection-attempt-timeout and see if the situation improves.