nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.8k stars 29.7k forks source link

SIGNAL USERS READ THIS FIRST: code: 'ERR_INTERNAL_ASSERTION' in internalConnectMultiple #47644

Closed yuki12321 closed 1 year ago

yuki12321 commented 1 year ago

Version

20.0.0

Platform

Darwin MacBookPro.local 22.3.0 Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64 x86_64

Subsystem

No response

What steps will reproduce the bug?

I ran the Storybook installation on Next.js 13.3.0 and this is what I see.

How often does it reproduce? Is there a required condition?

No response

What is the expected behavior? Why is that the expected behavior?

No response

What do you see instead?

$ npx sb init

 storybook init - the simplest way to add a Storybook to your project. 

 • Detecting project type. ✓
 • Adding Storybook support to your "Next" app
node:internal/assert:14
    throw new ERR_INTERNAL_ASSERTION(message);
    ^

Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals.
Please open an issue with this stack trace at https://github.com/nodejs/node/issues

    at new NodeError (node:internal/errors:399:5)
    at assert (node:internal/assert:14:11)
    at internalConnectMultiple (node:net:1106:3)
    at Timeout.internalConnectMultipleTimeout (node:net:1637:3)
    at listOnTimeout (node:internal/timers:575:11)
    at process.processTimers (node:internal/timers:514:7) {
  code: 'ERR_INTERNAL_ASSERTION'
}

Node.js v20.0.0

Additional information

No response

tniessen commented 1 year ago

Based on the stack trace, it looks similar to https://github.com/nodejs/node/issues/46669 and https://github.com/nodejs/node/issues/46670, so possibly related to https://github.com/nodejs/node/pull/44731, https://github.com/nodejs/node/pull/46587. cc @ShogunPanda

ShogunPanda commented 1 year ago

@tniessen It seems so. I'll take a look soon

kamagatos commented 1 year ago

I'm also seeing a similar issue with ssh2 after upgrading to Node 20.

Repro steps:

  1. Connect to an invalid host
  2. Add an error event handler
client.connect({
  host: 'yahoo.com',
  port: 22,
  username: 'bob',
  password: 'secret'
})

// Adding this line crashes the process.
client.on('error', () => {})
asp3 commented 1 year ago

seeing this issue as well on production

Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals.

  | 2023-04-29T13:32:31.670-04:00 | Please open an issue with this stack trace at https://github.com/nodejs/node/issues   | 2023-04-29T13:32:31.670-04:00 | at new NodeError (node:internal/errors:399:5)   | 2023-04-29T13:32:31.670-04:00 | at assert (node:internal/assert:14:11)   | 2023-04-29T13:32:31.670-04:00 | at internalConnectMultiple (node:net:1106:3)   | 2023-04-29T13:32:31.670-04:00 | at Timeout.internalConnectMultipleTimeout (node:net:1637:3)   | 2023-04-29T13:32:31.670-04:00 | at listOnTimeout (node:internal/timers:575:11)   | 2023-04-29T13:32:31.670-04:00 | at process.processTimers (node:internal/timers:514:7)

Our old build from 10 days ago still works, but all new builds seem to run into this issue.

MikeRalphson commented 1 year ago

Just a note, subscribing as still present in v20.1.0:

In a net and async/await heavy program:

node:internal/assert:14
    throw new ERR_INTERNAL_ASSERTION(message);
    ^

Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals.
Please open an issue with this stack trace at https://github.com/nodejs/node/issues

    at new NodeError (node:internal/errors:399:5)
    at assert (node:internal/assert:14:11)
    at internalConnectMultiple (node:net:1107:3)
    at Timeout.internalConnectMultipleTimeout (node:net:1638:3)
    at listOnTimeout (node:internal/timers:575:11)
    at process.processTimers (node:internal/timers:514:7) {
  code: 'ERR_INTERNAL_ASSERTION'
}

Node.js v20.1.0
ShogunPanda commented 1 year ago

This might be fixed by https://github.com/nodejs/node/pull/47860. Will keep you posted on this.

ShogunPanda commented 1 year ago

@kamagatos @yuki12321 The PR above has landed in master. If you can compile Node locally, do you mind checking it if solves your issues as well?

MikeRalphson commented 1 year ago

I now see the following on master:

/Users/mikeralphson/c/node/node[73535]: ../../src/crypto/crypto_tls.cc:1233:static void node::crypto::TLSWrap::GetServername(const FunctionCallbackInfo<v8::Value> &): Assertion `(wrap->ssl_) != nullptr' failed.
 1: 0x100bc1cf0 node::Abort() [/Users/mikeralphson/c/node/out/Release/node]
 2: 0x100bc1a30 node::PrintCaughtException(v8::Isolate*, v8::Local<v8::Context>, v8::TryCatch const&) [/Users/mikeralphson/c/node/out/Release/node]
 3: 0x100d14700 node::crypto::TLSWrap::GetServername(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/mikeralphson/c/node/out/Release/node]
 4: 0x100dca0d4 v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, unsigned long*, int) [/Users/mikeralphson/c/node/out/Release/node]
 5: 0x100dc9928 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [/Users/mikeralphson/c/node/out/Release/node]
 6: 0x10164cb24 Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit [/Users/mikeralphson/c/node/out/Release/node]
 7: 0x1015c43e4 Builtins_InterpreterEntryTrampoline [/Users/mikeralphson/c/node/out/Release/node]
 8: 0x10664d2cc
 9: 0x1015c250c Builtins_JSEntryTrampoline [/Users/mikeralphson/c/node/out/Release/node]
10: 0x1015c21f4 Builtins_JSEntry [/Users/mikeralphson/c/node/out/Release/node]
11: 0x100eb082c v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/Users/mikeralphson/c/node/out/Release/node]
12: 0x100eb00a8 v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) [/Users/mikeralphson/c/node/out/Release/node]
13: 0x100d77e80 v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) [/Users/mikeralphson/c/node/out/Release/node]
14: 0x100afdf84 node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) [/Users/mikeralphson/c/node/out/Release/node]
15: 0x100b120b8 node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) [/Users/mikeralphson/c/node/out/Release/node]
16: 0x100b292dc node::ConnectionWrap<node::TCPWrap, uv_tcp_s>::AfterConnect(uv_connect_s*, int) [/Users/mikeralphson/c/node/out/Release/node]
17: 0x100c917b0 node::MakeLibuvRequestCallback<uv_connect_s, void (*)(uv_connect_s*, int)>::Wrapper(uv_connect_s*, int) [/Users/mikeralphson/c/node/out/Release/node]
18: 0x1015acf94 uv__stream_io [/Users/mikeralphson/c/node/out/Release/node]
19: 0x1015b5650 uv__io_poll [/Users/mikeralphson/c/node/out/Release/node]
20: 0x1015a2e94 uv_run [/Users/mikeralphson/c/node/out/Release/node]
21: 0x100afe7d0 node::SpinEventLoopInternal(node::Environment*) [/Users/mikeralphson/c/node/out/Release/node]
22: 0x100c03504 node::NodeMainInstance::Run() [/Users/mikeralphson/c/node/out/Release/node]
23: 0x100b899d0 node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResultImpl const*) [/Users/mikeralphson/c/node/out/Release/node]
24: 0x100b89c24 node::Start(int, char**) [/Users/mikeralphson/c/node/out/Release/node]
25: 0x1a2ad3f28 start [/usr/lib/dyld]
zsh: abort      
ShogunPanda commented 1 year ago

@MikeRalphson Can you please provide a repro file, along with the dig query of all the hosts you are trying to reach?

tniessen commented 1 year ago

The same internal assertion is causing CI to fail on Fedora machines, see https://github.com/nodejs/node/issues/48000. However, it is not the ERR_INTERNAL_ASSERTION that this issue was about originally.

Side note: the main branch is called main. We intentionally abandoned the previous branch name.

bradleybeighton commented 1 year ago

This randomly started happening on my production build after a deployment. I'm using lts-alpine image in my docker file, has there been a recent change that could be causing this?

ShogunPanda commented 1 year ago

@brandon-beacher It's the network family auto selection which was enabled by default in 20.0.0. I fixed this issue (already merged in main) and once I'll fix some other problems it should go out on 20.2.0 or 20.3.0.

tniessen commented 1 year ago

This is still an issue in Node.js 20.2.0. As a workaround, you can try restoring the more predictable pre-20 behavior by setting this environment variable:

NODE_OPTIONS="--no-network-family-autoselection"
arllop commented 1 year ago

this happened when i open signal app on mac laptop. has this been fixed? thanks.

tniessen commented 1 year ago

@arllop If Signal does not override NODE_OPTIONS, the workaround in https://github.com/nodejs/node/issues/47644#issuecomment-1554440759 might work.

bnoordhuis commented 1 year ago

I've pinned this issue in the (possibly in vain) hope it'll stem the tide of duplicate bug reports.

Shnub commented 1 year ago

Also got this just now from Signal (6.20.0) after waking macOS from sleep with Signal running:

Unhandled Error

Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals.
Please open an issue with this stack trace at https://github.com/nodejs/node/issues

    at new NodeError (node:internal/errors:399:5)
    at assert (node:internal/assert:14:11)
    at internalConnectMultiple (node:net:1077:3)
    at afterConnectMultiple (node:net:1532:5)
bnoordhuis commented 1 year ago

Vain hope indeed, dear god. I've changed the title, let's hope it's enough for Signal users to put two and two together 🤦

Shnub commented 1 year ago

While I see how this is annoying for developers here, do kindly keep in mind that this error message is shown to Signal end users, very few of which are developers. And the message explicitly asks to open an issue here, so that's what folks are doing. With the error message itself being basically gibberish to laypeople, it's little wonder some end up opening redundant issues with little more than good intentions.

joyeecheung commented 1 year ago

I wonder if we should just disable network_family_autoselection by default again until the bug is fixed. The assert() we have is under the assumption that this should rarely get hit and when it does get hit we want to know about it. In this case this gets hit too often that the assert() here already becomes more annoying than it is helpful.

tniessen commented 1 year ago

@joyeecheung According to @ShogunPanda, this will be fully resolved by 20.3.0. However, even then, the implementation does not conform to the Happy Eyeballs RFC and may result in timeouts that did not occur in previous versions of Node.js, so I wouldn't be opposed to disabling it by default.

ShogunPanda commented 1 year ago

I can confirm I fixed all variants of this bug so after 20.3.0 all should be good. About deviating from RFC, well, we never committed to have a compliant implementation of that, but just a loose one.

Disabling it again by default will cause harm to people with broken IPv6 stack.

The best course of action, IMVHO, would be to wait for 20.3.0 to settle a little bit and see if bug reports stop. If not we disable it. WDYT?

RBCK commented 1 year ago

Unhandled Error

Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals. Please open an issue with this stack trace at https://github.com/nodejs/node/issues

at new NodeError (node:internal/errors:399:5)
at assert (node:internal/assert:14:11)
at internalConnectMultiple (node:net:1077:3)
at afterConnectMultiple (node:net:1532:5)
Veercodeprog commented 1 year ago

you can include the NODE_OPTIONS environment variable in the script command to avoid this error. For example: json

"scripts": { "start": "NODE_OPTIONS=no-network-family-autoselection node your-app.js" }

ShogunPanda commented 1 year ago

@Veercodeprog Confirmed. This can temporarily disable the problem until 20.3.0 is released.

Veercodeprog commented 1 year ago

what the correct solution then. i can see this mainly arises while trying to upload images from server to cloudinary.

ShogunPanda commented 1 year ago

There's no other required on your side. This bug was in node and I already fixed it. It will be publicly available once 20.3.0 is released.

Once you update to 20.3.0 everything will work as it was before.

Veercodeprog commented 1 year ago

do we get this error in stable versions too ?

brian6932 commented 1 year ago

There's no other required on your side. This bug was in node and I already fixed it. It will be publicly available once 20.3.0 is released.

Once you update to 20.3.0 everything will work as it was before.

I've updated to 20.3.0, however I still exhibit this issue. It does seem like it's fixed some IPv6 timeout issues for me tho.

ShogunPanda commented 1 year ago

@brian6932 Can you post here for which hosts this failed and the dig query for them? Thanks!

brian6932 commented 1 year ago

@ShogunPanda I don't have an issue with a domain resolve, connecting with net/tls works fine for me, but a library I use exhibits the issue

ShogunPanda commented 1 year ago

I see. Can you please name which library and how to reproduce the issue?

brian6932 commented 1 year ago

@ShogunPanda https://github.com/KararTY/dank-twitch-irc/issues/13

Veercodeprog commented 1 year ago

Maybe a way of node to suggest code improvement.

Veercodeprog commented 1 year ago

Can be fixed without changing any node settings or package,if we improve our code

ShogunPanda commented 1 year ago

Hello. The fix is in the PR #48464. I'll let you know once it lands so you can try this again a nightly build.

bnoordhuis commented 1 year ago

@indutny Fedor, can you let me know when Signal picks up the fix so I can unpin this issue again? Cheers.

krousseau commented 1 year ago

This is still occurring for me with node 20.4:

Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals.
Please open an issue with this stack trace at https://github.com/nodejs/node/issues

    at new NodeError (node:internal/errors:405:5)
    at assert (node:internal/assert:14:11)
    at internalConnectMultiple (node:net:1115:3)
    at Timeout.internalConnectMultipleTimeout (node:net:1683:3)
    at listOnTimeout (node:internal/timers:575:11)
    at process.processTimers (node:internal/timers:514:7)
tniessen commented 1 year ago

@krousseau Could you please confirm that this does indeed happen with Node.js 20.4.0? Is there any additional information you could provide? cc @ShogunPanda

dmcr commented 1 year ago

Have been redirected here as per my closed duplicated ticket.

So as I understand it this is still an active issue in v20.4 and is related to the network area with the current solutions being to: Apply NODE_OPTIONS="--no-network-family-autoselection" or to use node ^v19.

I have been redirected here but this is also closed so presumably the best place to follow as to when ^v20 is usable is now #48763 #https://github.com/request/request/issues/3458

I can confirm that going back to v19 resolves this issue in production for me.

ShogunPanda commented 1 year ago

@dmcr can you please provide a repro repo or at least which host are you trying to connect to and how your DNS resolves such host?

Rand0mF commented 1 year ago

Issue still occurring for node 20.5.1 with the same stacktrace. Randomly starts occurring and keeps happening for multiple hours after which it randomly stopped. Node is running as a kubernetes service and not using workers or websockets (like https://github.com/nodejs/node/issues/48763). Using the docker image node:20.5.1

ShogunPanda commented 1 year ago

@Rand0mF In order to address this, I need to know which host your node app is connecting to and how your local (or Kubernetes system) is resolving it. Can you provide me such info?

Rand0mF commented 1 year ago

@ShogunPanda Unfortunately I don't know.. The app is doing many queries to all kinds of hosts. The error logs don't provide any useful information for that, that's the only thing which is printed. Reverse engineering is difficult as it seems to occur completely at random, I didn't yet find a way to reproduce it.

Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals. Please open an issue with this stack trace at https://github.com/nodejs/node/issues
    at new NodeError (node:internal/errors:405:5)
    at assert (node:internal/assert:14:11)
    at internalConnectMultiple (node:net:1118:3)
    at Timeout.internalConnectMultipleTimeout (node:net:1687:3)
    at listOnTimeout (node:internal/timers:575:11)
    at process.processTimers (node:internal/timers:514:7)
ainsleyclark commented 1 year ago

Very sporadic error which seems to be difficult to debug locally as @Rand0mF mentioned. export NODE_OPTIONS=--no-network-family-autoselection works. All v20.x versions are producing the same error.

ShogunPanda commented 1 year ago

@ainsleyclark Is the same for you? Can you provide a list of which hosts the production system is connecting to and how these are resolved by the production system DNS servers?

pimterry commented 1 year ago

I'm seeing lots of these errors too, on Node v20.8.0. Similar to other reports, it seems to be very intermittent and only happening under load. I can't confirm the addresses involved either unfortunately (software is running on end user machines, I just see exception reports).

Rand0mF commented 1 year ago

@ShogunPanda list of hosts will still be difficult, here's the dig output from within the container for

cluster-external hosts ``` dig google.com ; <<>> DiG 9.18.19 <<>> google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7886 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 30 IN A 142.250.185.142 ;; Query time: 2 msec ;; SERVER: 10.251.0.10#53(10.251.0.10) (UDP) ;; WHEN: Thu Nov 16 14:03:26 UTC 2023 ;; MSG SIZE rcvd: 55 ```
cluster-internal hosts ``` dig hello-world.default.svc.cluster.local ; <<>> DiG 9.18.19 <<>> hello-world.default.svc.cluster.local ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13182 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;hello-world.default.svc.cluster.local. IN A ;; ANSWER SECTION: hello-world.default.svc.cluster.local. 30 IN A 10.251.3.203 ;; Query time: 8 msec ;; SERVER: 10.251.0.10#53(10.251.0.10) (UDP) ;; WHEN: Thu Nov 16 14:08:01 UTC 2023 ;; MSG SIZE rcvd: 71 ```

any other flags which would be helpful?

ShogunPanda commented 11 months ago

Nope, that should be it. I will keep you posted.

stefanrows commented 11 months ago

Still experiencing the issue with Node 20^. Switching to Node 18 solves the problem for me.