nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.98k stars 29.24k forks source link

Assertion `(expected_utf16_length) == (utf16_length)' failed #47457

Closed danpeixoto closed 12 months ago

danpeixoto commented 1 year ago

Version

v19.8.1

Platform

Microsoft Windows NT 10.0.22621.0 x64

Subsystem

No response

What steps will reproduce the bug?

  1. node
  2. // start typing new Date()
  3. new Date
  4. throw error and close cli

How often does it reproduce? Is there a required condition?

Every time I try this in cli.

What is the expected behavior? Why is that the expected behavior?

Display current date. Date Thu Apr 06 2023 21:06:02 GMT-0300

What do you see instead?

new DateWindows PowerShell[5012]: c:\ws\src\inspector\node_string.cc:39: Assertion `(expected_utf16_length) == (utf16_length)' failed.
 1: 00007FF7DD50234F node_api_throw_syntax_error+179983
 2: 00007FF7DD486986 v8::internal::MicrotaskQueue::GetMicrotasksScopeDepth+61942
 3: 00007FF7DD486D62 v8::internal::MicrotaskQueue::GetMicrotasksScopeDepth+62930
 4: 00007FF7DD34EB55 std::basic_ios<char,std::char_traits<char> >::init+597
 5: 00007FF7DD367E8A v8::internal::IncrementalMarking::IsMarking+29514
 6: 00007FF7DD365796 v8::internal::IncrementalMarking::IsMarking+19542
 7: 00007FF7DD359CA7 std::basic_ostream<char,std::char_traits<char> >::put+45383
 8: 00007FF7DDF7E81E v8::internal::Builtins::name+314382
 9: 00007FF7DDF7E418 v8::internal::Builtins::name+313352
10: 00007FF7DDF7E6F9 v8::internal::Builtins::name+314089
11: 00007FF7DDF7E550 v8::internal::Builtins::name+313664
12: 00007FF7DE06FB61 v8::internal::SetupIsolateDelegate::SetupHeap+606961
13: 00007FF75E1D7EF7

Additional information

No response

mscdex commented 1 year ago

Out of curiosity, can you test with node v19.6.1 and see if the same problem exists?

danpeixoto commented 1 year ago

No error.

Welcome to Node.js v19.6.1.
Type ".help" for more information.
> new Date()
2023-04-07T14:00:29.886Z
mscdex commented 1 year ago

/cc @anonrig This seems to be related to simdutf.

anonrig commented 1 year ago

cc @lemire

soumyadeepdutta commented 1 year ago
Welcome to Node.js v19.8.1.
Type ".help" for more information.
> new Date()
2023-04-07T17:45:36.846Z
> new Date
2023-04-07T17:45:52.221Z
> os.release()
'10.0.22621'

Seems to be working 🙄

danpeixoto commented 1 year ago

Can it be a Windows 11 problem? It don't even let me finish the command image

There is no problem if I call it inside a function

Welcome to Node.js v19.8.1.
Type ".help" for more information.
> console.log(new Date())
2023-04-07T16:07:54.389Z
undefined
anonrig commented 1 year ago

I don't use Windows, but on macOS, it does not crash. @lemire, do you see any particular reason why would this fail?

lemire commented 1 year ago

This could fail if the input is not valid UTF-8, in which case utf16_length would be zero.

lemire commented 1 year ago

It works for me... on a freshly installed node...

image
lemire commented 1 year ago

Works with PowerShell as well.

image
lemire commented 1 year ago

I have opened a PR that might help with this issue: https://github.com/nodejs/node/pull/47471

This being said, it is not clear to me where the error comes from. Something in Node expects UTF-8 and tries to convert it to UTF-16 but the UTF-8 is not valid. Where does the invalid UTF-8 comes from? Does it come from Node itself?

danpeixoto commented 1 year ago

Could it be associated with the result preview in the CLI?

lemire commented 1 year ago

Could it be associated with the result preview in the CLI?

Almost certainly but I see the previews in my terminals as well. It is weird because Date() will result in pure ASCII. That's not the kind of string that should be problematic. There might be an issue elsewhere in Node.js.

If you use a different terminal, does the same problem persists?

Could you try Windows Terminal?

danpeixoto commented 1 year ago

Yes, on all the ones I have installed. I tried CMD, git bash, and power shell. I will try to instal WSL and test in there

danpeixoto commented 1 year ago

It works in Ubuntu wsl

peixoto@:~$ uname -r
5.15.90.1-microsoft-standard-WSL2
peixoto@:~$ node
Welcome to Node.js v19.8.1.
Type ".help" for more information.
> new Date()
2023-04-08T13:37:49.706Z
>
HeikoTheissen commented 1 year ago

The issue also occurs in Node 18.16.0 on Windows 11, but not in Node 18.15.0.

dami-i commented 1 year ago

Same problem here. Same circumstances and conditions.

Using Windows 11 with Node v18.16.1.

It happens under both PowerShell and Command Prompt. It also happens within Node REPL.

It doesn't happen when running inside WSL. Downgrading to v18.15.0 with nvm-windows made the problem go away.

Same message:

> node --inspect
Debugger listening on ws://127.0.0.1:9229/c8338e46-617a-4eaa-ad7e-7ffaa12f4158
For help, see: https://nodejs.org/en/docs/inspector
Welcome to Node.js v18.16.1.
Type ".help" for more information.
> new DateC:\Program Files\PowerShell\7\pwsh.exe[2520]: c:\ws\src\inspector\node_string.cc:39: Assertion `(expected_utf16_length) == (utf16_length)' failed.
 1: 00007FF68280D54F node_api_throw_syntax_error+175743
 2: 00007FF682792E96 DSA_meth_get_flags+59654
 3: 00007FF682793272 DSA_meth_get_flags+60642
 4: 00007FF682661A35 std::basic_ios<char,std::char_traits<char> >::init+597
 5: 00007FF68267A75A std::basic_ostream<char,std::char_traits<char> >::put+101658
 6: 00007FF682678306 std::basic_ostream<char,std::char_traits<char> >::put+92358
 7: 00007FF68266CB62 std::basic_ostream<char,std::char_traits<char> >::put+45346
 8: 00007FF6831F326D v8::internal::Builtins::code+248237
 9: 00007FF6831F2E79 v8::internal::Builtins::code+247225
10: 00007FF6831F313C v8::internal::Builtins::code+247932
11: 00007FF6831F2FA0 v8::internal::Builtins::code+247520
12: 00007FF6832D7D81 v8::internal::SetupIsolateDelegate::SetupHeap+558449
13: 00007FF60341CD37
lemire commented 1 year ago

@dami-i Can you check with Node 20.0 or better?

I proposed a PR https://github.com/nodejs/node/pull/47471 that would allow invalid strings (they get turned to empty strings). I suspect that my PR fixed this issue you are encountering.

dami-i commented 1 year ago

@dami-i Can you check with Node 20.0 or better?

I proposed a PR #47471 that would allow invalid strings (they get turned to empty strings). I suspect that my PR fixed this issue you are encountering.

@lemire Your suspicions are right. I've just tried out with 20.3.1 and had no issue whatsoever. 🚀

Do you know if there is any chance to have this bugfix implemented in v18?

lemire commented 1 year ago

@dami-i That's a good question.

@anonrig : what do you think? Can we backport https://github.com/nodejs/node/pull/47471 to node 18?

107815684 commented 1 year ago

Captura de tela 2023-07-10 202524 alguém me ajuda

107815684 commented 1 year ago

Welcome to Node.js v20.3.0 versão

wiggisser commented 1 year ago

Can confirm this issue for

node18 17

Downgrading to node v18.15 resolved the issue for me at the moment

node18 15

HeikoTheissen commented 1 year ago

Can confirm this issue for

  • node v18.17

And also for v18.17.1 and Windows 11.

HeikoTheissen commented 1 year ago

The issue also occurs for Windows 11 and Node.js v20.6.1 (works with v20.5.1).

HeikoTheissen commented 1 year ago

The error can also be provoked on Windows 11 and Node v20.5.1 by typing "𬉼 into the REPL:

Welcome to Node.js v20.5.1.
Type ".help" for more information.
> "𬉼Command Prompt - node[12224]: c:\ws\src\inspector\node_string.cc:80: Assertion `utf8_length == 0 || utf8_length == expected_utf8_length' failed.
 1: 00007FF6D17ABCDF node_api_throw_syntax_error+203199
 2: 00007FF6D1728226 v8::base::CPU::num_virtual_address_bits+74518
 3: 00007FF6D1728632 v8::base::CPU::num_virtual_address_bits+75554
 4: 00007FF6D15CB0FD uv_loop_size+30349
 5: 00007FF6D15EBF14 uv_loop_size+165028
 6: 00007FF6D15E9496 uv_loop_size+154150
 7: 00007FF6D15DCEE7 uv_loop_size+103543
 8: 00007FF6D21487FE v8::SharedValueConveyor::SharedValueConveyor+416270
 9: 00007FF6D21483FA v8::SharedValueConveyor::SharedValueConveyor+415242
10: 00007FF6D21486BF v8::SharedValueConveyor::SharedValueConveyor+415951
11: 00007FF6D2148530 v8::SharedValueConveyor::SharedValueConveyor+415552
12: 00007FF6D2242F5E v8::PropertyDescriptor::writable+674286
13: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
14: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
15: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
16: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
17: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
18: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
19: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
20: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
21: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
22: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
23: 00007FF6D21EFAB1 v8::PropertyDescriptor::writable+333121
24: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
25: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
26: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
27: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
28: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
29: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
30: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
31: 00007FF6D21B4600 v8::PropertyDescriptor::writable+90256
32: 00007FF6D21B29C0 v8::PropertyDescriptor::writable+83024
33: 00007FF6D21B25BB v8::PropertyDescriptor::writable+81995
34: 00007FF6D205D5F8 v8::StackTrace::GetFrameCount+3944
35: 00007FF6D205CEF3 v8::StackTrace::GetFrameCount+2147
36: 00007FF6D217159B v8::Function::Call+571
37: 00007FF6D17E1948 node::CallbackScope::~CallbackScope+1544
38: 00007FF6D17D9318 v8::base::CPU::has_osxsave+146744
39: 00007FF6D1606D6D RSA_meth_get_flags+17293
40: 00007FF6D1607A3B RSA_meth_get_flags+20571
41: 00007FF6D1602F34 RSA_meth_get_flags+1364
42: 00007FF6D1601E42 v8::CTypeInfoBuilder<bool>::Build+29234
43: 00007FF6D17F6518 uv_udp_set_ttl+17128
44: 00007FF6D180FF7E uv_run+478
45: 00007FF6D17E0C65 node::SpinEventLoop+405
46: 00007FF6D16DA1CB X509_STORE_get_cleanup+151147
47: 00007FF6D17659CE node::InitializeOncePerProcess+2974
48: 00007FF6D1767DB3 node::Start+3779
49: 00007FF6D1766F20 node::Start+48
50: 00007FF6D154E91C AES_cbc_encrypt+151372
51: 00007FF6D297ABAC inflateValidate+19260
52: 00007FF8F6FB257D BaseThreadInitThunk+29
53: 00007FF8F81EAA68 RtlUserThreadStart+40

The offending character is \u{2c27c}.

lemire commented 1 year ago

I can verify the problem with Node 20.7.0 under Windows 11. In fact, it appears that typing any non-ASCII character in the Node shell as the first character seems to trigger the check and lead to an abort. Just 'é' will suffice.

image

The problem only occurs while typing directly in the Node shell. If you execute a script, it is fine.

If I build Node from source under Windows, using the current Node code, and simply executing vcbuild.bat, and running the generated node.exe, I cannot reproduce the problem.

image

So the bug is real, but it seems to have to do with how Node is built for windows as part of the release process. I would normally consider the possibility that the problem was recently fixed in the code, but Node 20.7.0 is a very recent release, and I find it unlikely that something happened in the last three days to the code to fix this issue.

anonrig commented 1 year ago

cc @nodejs/platform-windows @nodejs/platform-windows-arm @nodejs/build

HeikoTheissen commented 1 year ago

In fact, it appears that typing any non-ASCII character in the Node shell as the first character seems to trigger the check and lead to an abort. Just 'é' will suffice.

Not any non-ASCII character: Typing é crashes, but typing 𬉼 as the first character does not:

Welcome to Node.js v20.5.1.
Type ".help" for more information.
> 𬉼
Uncaught ReferenceError: 𬉼 is not defined
lemire commented 1 year ago

@HeikoTheissen

Interesting.

Again, if I build Node from source (Visual Studio 2022), I don’t see this effect. It would be important to know how the releases for Windows are built.

richardlau commented 1 year ago

It would be important to know how the releases for Windows are built.

Currently on Windows 2012r2 with Visual Studio 2019: https://github.com/nodejs/node/blob/main/BUILDING.md#official-binary-platforms-and-toolchains although there are plans to migrate to newer versions of Windows/Visual Studio: https://github.com/nodejs/build/issues/3373

lemire commented 1 year ago

@richardlau When trying to build for vs2019, I hit https://github.com/nodejs/node/issues/48987

That is, doing vcbuild.bat vs2019 fails with the error described in https://github.com/nodejs/node/issues/48987

Is there some documentation on how the code is built for releases? Precisely? In a way that we can reproduce locally?

Building for vs2022, I don't have the problem.

lemire commented 1 year ago

As for what is happening... every time you press a key in the REPL, a JSON document is generated of the form...

{"id":1,"method":"Runtime.evaluate","params":{"expression":"v","throwOnSideEffect":true,"timeout":333}}

E.g., that is what is generated if I type the letter v. That seems a bit insane, but that's what is happening.

Importantly, the JSON is encoding in UTF-8, as it should.

Now, this gets parsed in std::unique_ptr<Value> parseJSON(const std::string_view string). But to parse it, we somehow have to convert it to UTF-16 first. And that's where the problem occurs (it seems).

An easy way to debug this problem without any sophistication would be to patch the Node code like so...

index 0f780f46c8..8cbe6ce05f 100644
--- a/src/inspector/node_string.cc
+++ b/src/inspector/node_string.cc
@@ -34,16 +34,27 @@ void builderAppendQuotedString(StringBuilder& builder,
 }

 std::unique_ptr<Value> parseJSON(const std::string_view string) {
+  printf("parsejson\n");
+    printf(" input: '%.*s'\n",(int)string.size(), string.data());
+    printf(" input: ");
+  for(size_t i = 0; i < string.size(); i++) {
+    printf(" %02x ", (unsigned)string[i]&0xff);
+  }
+  printf(" (%zu words)\n", string.size());
   if (string.empty())
     return nullptr;
   size_t expected_utf16_length =
       simdutf::utf16_length_from_utf8(string.data(), string.length());
+    printf(" expected_utf16_length %zu \n", expected_utf16_length);
+
   MaybeStackBuffer<char16_t> buffer(expected_utf16_length);
   // simdutf::convert_utf8_to_utf16 returns zero in case of error.
   size_t utf16_length = simdutf::convert_utf8_to_utf16(
       string.data(), string.length(), buffer.out());
   // We have that utf16_length == expected_utf16_length if and only
   // if the input was a valid UTF-8 string.
+    printf(" utf16_length %zu \n", utf16_length);

This would print out the offending input and we could figure things out from there.

richardlau commented 1 year ago

Is there some documentation on how the code is built for releases? Precisely? In a way that we can reproduce locally?

Precisely, no, but the Jenkins job that does the release builds executes vcbuild.bat build-release %ARCH% upload where %ARCH% is e.g. x64. Trying to run that locally will attempt to sign the binaries/installer and upload to the staging server but those are attempted after compilation.

@StefanStojanovic added some documentation for creating local custom Windows VMs (e.g. VirtualBox) that reuse much of the Ansible setup we use to set up the CI (both test and release) machines but I've not personally run these myself https://github.com/nodejs/build/blob/main/doc/create-windows-custom-vm.md

lemire commented 1 year ago

Ok, so I have dug more into what is happening in this code.

  1. You hit a key within the REPL.
  2. This calls dispatchMessageFromFrontend.
  3. It calls dispatchProtocolMessage. It converts the UTF-16 payload to UTF-8 using StringUtil::StringViewToUtf8. This calls simdutf::convert_utf16_to_utf8 which converts the provided UTF-16 input (passed by a string view) into a local buffer (MaybeStackBuffer<char>) which then gets copied into a new string return String(buffer.out(), utf8_length). This string is then local to dispatchProtocolMessage. As a string view, we pass it to protocol::StringUtil::parseMessage as a UTF-8 string view. This calls parseJSON(message) which takes the UTF-8 payload and converts it back to UTF-16 using simdutf::convert_utf8_to_utf16.

Seemingly, this is where things go bad under the released binaries under Windows but I cannot reproduce the problem with the binaries that I build.

lemire commented 1 year ago

I have created PR https://github.com/nodejs/node/pull/49780 where I attempt to remove this back and forth between UTF-8 and UTF-16 at each key press in the REPL.

lemire commented 1 year ago

PR https://github.com/nodejs/node/pull/49780 seems to pass all tests (thanks @anonrig) and might effectively fix this current bug. Though I cannot reproduce the bug when building Node, PR https://github.com/nodejs/node/pull/49780 removes the offending code (by simplifying the processing).

deepak1556 commented 12 months ago

@lemire I might have some additional debug info that would related to this bug,

We started seeing consistent failure in our CI for test-repl-history-navigation.js on Windows x64 when building using VS 2019 toolchain. Build arguments are pretty much .\vcbuild build-release nosign x64 and VS version is https://github.com/actions/runner-images/blob/win19/20230924.1/images/win/Windows2019-Readme.md#visual-studio-enterprise-2019

=== release test-repl-history-navigation ===
Path: parallel/test-repl-history-navigation
npm[7988]: D:\a\_work\1\s\src\inspector\node_string.cc:66: Assertion `(expected_utf8_length) == (utf8_length)' failed.
 1: 00007FF6BF57D59F node::DumpBacktrace+143 [D:\a\_work\1\s\src\debug_utils.cc]:L310
 2: 00007FF6BF502EA6 node::Abort+22 [D:\a\_work\1\s\src\node_errors.cc]:L328
 3: 00007FF6BF503282 node::Assert+146 [D:\a\_work\1\s\src\node_errors.cc]:L344
 4: 00007FF6BF3CAF68 node::inspector::protocol::StringUtil::StringViewToUtf8+280 [D:\a\_work\1\s\src\inspector\node_string.cc]:L66
 5: 00007FF6BF3EA710 node::inspector::`anonymous namespace'::ChannelImpl::dispatchProtocolMessage+80 [D:\a\_work\1\s\src\inspector_agent.cc]:L248
 6: 00007FF6BF3E82E6 node::inspector::`anonymous namespace'::SameThreadInspectorSession::Dispatch+118 [D:\a\_work\1\s\src\inspector_agent.cc]:L1003
 7: 00007FF6BF3DCB52 node::inspector::`anonymous namespace'::JSBindingsConnection<node::inspector::`anonymous namespace'::LocalConnection>::Dispatch+290 [D:\a\_work\1\s\src\inspector_js_api.cc]:L143
 8: 00007FF6BFF632ED v8::internal::FunctionCallbackArguments::Call+221 [D:\a\_work\1\s\deps\v8\src\api\api-arguments-inl.h]:L148
 9: 00007FF6BFF62EF9 v8::internal::`anonymous namespace'::HandleApiCallHelper<0>+617 [D:\a\_work\1\s\deps\v8\src\builtins\builtins-api.cc]:L114
10: 00007FF6BFF631BC v8::internal::Builtin_Impl_HandleApiCall+268 [D:\a\_work\1\s\deps\v8\src\builtins\builtins-api.cc]:L142
11: 00007FF6BFF63020 v8::internal::Builtin_HandleApiCall+32 [D:\a\_work\1\s\deps\v8\src\builtins\builtins-api.cc]:L130
12: 00007FF6C0047DD1 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit+61 [D:\a\_work\1\s\out\Release\obj\v8_snapshot\embedded.S]:L19682
13: 00007FF6401C74B7

The failure started after updating our branch from 18.15.0 -> 18.16.1 . I then backported https://github.com/nodejs/node/commit/63ee335eb01ae36f9cccce07cea7801a3b3f8d55 and https://github.com/nodejs/node/commit/2990390359d3bb61c7f5301a28e4e91b8d054054 which now changed the error to the following

=== release test-repl-history-navigation ===
Path: parallel/test-repl-history-navigation
npm[8072]: D:\a\_work\1\s\src\inspector\node_string.cc:79: Assertion `utf8_length == 0 || utf8_length == expected_utf8_length' failed.
 1: 00007FF78CEAD66F node::DumpBacktrace+143 [D:\a\_work\1\s\src\debug_utils.cc]:L310
 2: 00007FF78CE32F76 node::Abort+22 [D:\a\_work\1\s\src\node_errors.cc]:L328
 3: 00007FF78CE33352 node::Assert+146 [D:\a\_work\1\s\src\node_errors.cc]:L344
 4: 00007FF78CCFB1ED node::inspector::protocol::StringUtil::StringViewToUtf8+285 [D:\a\_work\1\s\src\inspector\node_string.cc]:L79
 5: 00007FF78CD1A7F0 node::inspector::`anonymous namespace'::ChannelImpl::dispatchProtocolMessage+80 [D:\a\_work\1\s\src\inspector_agent.cc]:L248
 6: 00007FF78CD183C6 node::inspector::`anonymous namespace'::SameThreadInspectorSession::Dispatch+118 [D:\a\_work\1\s\src\inspector_agent.cc]:L1003
 7: 00007FF78CD0CC32 node::inspector::`anonymous namespace'::JSBindingsConnection<node::inspector::`anonymous namespace'::LocalConnection>::Dispatch+290 [D:\a\_work\1\s\src\inspector_js_api.cc]:L143
 8: 00007FF78D8933BD v8::internal::FunctionCallbackArguments::Call+221 [D:\a\_work\1\s\deps\v8\src\api\api-arguments-inl.h]:L148
 9: 00007FF78D892FC9 v8::internal::`anonymous namespace'::HandleApiCallHelper<0>+617 [D:\a\_work\1\s\deps\v8\src\builtins\builtins-api.cc]:L114
10: 00007FF78D89328C v8::internal::Builtin_Impl_HandleApiCall+268 [D:\a\_work\1\s\deps\v8\src\builtins\builtins-api.cc]:L142
11: 00007FF78D8930F0 v8::internal::Builtin_HandleApiCall+32 [D:\a\_work\1\s\deps\v8\src\builtins\builtins-api.cc]:L130
12: 00007FF78D977EA1 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit+61 [D:\a\_work\1\s\out\Release\obj\v8_snapshot\embedded.S]:L19682
13: 00007FF70DB074B7

As a next step, I tried to repro the failure locally and I was able to confirm the failure with VS 2019 while test passes when building Node.js with VS 2022. I added the debug logs to https://github.com/nodejs/node/blob/fce8fbadcd52372ab2676a3e310c2ec5c22b4121/src/inspector/node_string.cc#L61 and found the first expression that triggers the error was https://github.com/nodejs/node/blob/fce8fbadcd52372ab2676a3e310c2ec5c22b4121/test/parallel/test-repl-history-navigation.js#L172

With this data, I tried to create a minimal repro outside Node.js and ended up with the following test case

#include <memory>
#include <uchar.h>
#include <iostream>
#include "simdutf.h"

int main() {
    const uint16_t buf[] = {123,34,105,100,34,58,49,44,34,109,101,116,104,111,100,34,58,34,82,117,110,116,105,109,101,46,101,118,97,108,117,97,116,101,34,44,34,112,97,114,97,109,115,34,58,123,34,101,120,112,114,101,115,115,105,111,110,34,58,34,99,111,110,115,116,32,120,50,32,61,32,39,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,55357,56341,39,34,44,34,116,104,114,111,119,79,110,83,105,100,101,69,102,102,101,99,116,34,58,116,114,117,101,44,34,116,105,109,101,111,117,116,34,58,51,51,51,125,125};
    const char16_t* source =
      reinterpret_cast<const char16_t*>(buf);
    const size_t length = std::size(buf);
    size_t expected_size =
        simdutf::utf8_length_from_utf16(source, length);
    std::unique_ptr<char[]>buffer(new char[expected_size]);
    size_t out_size =
        simdutf::convert_utf16_to_utf8(source, length, buffer.get());
    std::string final_string(buffer.get(), out_size);
    std::cout << final_string << std::endl;
    printf("%zu : %zu\n", expected_size, out_size);
    assert(expected_size == out_size);
    return 0;
}

The uin16_t[] corresponds to the following string {"id":1,"method":"Runtime.evaluate","params":{"expression":"const x2=...',"thrrowOnSideEffect":true,"timeout":333}} obtained from the failing Node.js test. Now the interesting part of the bug starts, I first built the test case with both VS 2019 and 2022 using the following arguments

& 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64\cl.exe' simd_test.cpp C:\Users\demohan\github\vscode-node\deps\simdutf\simdutf.cpp /I C:\Users\demohan\github\vscode-node\deps\simdutf /EHsc /Zc:__cplusplus -std:c++17 /W3 /link /MACHINE:x64 /LIBPATH:'C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\lib\x64' /LIBPATH:'C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\um\x64' /LIBPATH:'C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64'

& 'C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.37.32822\bin\Hostx64\x64\cl.exe' simd_test.cpp C:\Users\demohan\github\vscode-node\deps\simdutf\simdutf.cpp /EHsc /I C:\Users\demohan\github\vscode-node\deps\simdutf /FC /Zc:__cplusplus -std:c++17 /W3 /link /MACHINE:x64 /LIBPATH:'C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.37.32822\lib\x64' /LIBPATH:'C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\um\x64' /LIBPATH:'C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64'

Results from VS 2019:

> simd_test.exe
611 : 611

Results from VS 2022:

> simd_test.exe
611 : 611

The length are as expected in both cases, so clearly a build flag is triggering the differences. I then copied the flags Node.js uses to build simdtuf.lib which can be found in out/Release/obj/simdtuf/simdtuf.tlog/CL.command1.tlog and narrowed it down to the following flag /Ox, so now building the above example again with the optimization flag included leads to the following results

Results from VS 2019:

> simd_test.exe
611 : 575
Assertion failed: expected_size == out_size, file simd_test.cpp, line 19

Results from VS 2022:

> simd_test.exe
611 : 611

Additional data, between 18.15.0 -> 18.16.1 there was simdtuf version bump from 3.1.0 -> 3.2.2 and the relevant change seems to be https://github.com/simdutf/simdutf/commit/ee8971d75a886b7a5300d1501220ee2fa13d76d4. I am not sure how the optimization flag affects this codepath but I hope this is useful data to identify the root cause.

lemire commented 12 months ago

@deepak1556 That's a fantastic analysis.

I'll investigate soon and, hopefully, I'll find a bug fix or workaround. Should be reasonably easy now that you have done all this work.

lemire commented 12 months ago

@deepak1556 I reproduce the issue, but only on the icelake (AVX-512) kernel and only with Visual Studio 2019.

Is the hardware that you are testing this on capable of AVX-512? E.g., Zen 4 or Ice Lake, something of the sort?

Could you run the modified version of your code that looks like this :

#include "simdutf.h"
#include <iostream>
#include <memory>
#include <uchar.h>

int main() {
  const uint16_t buf[] = {
      123,   34,    105,   100,   34,    58,    49,    44,    34,    109,
      101,   116,   104,   111,   100,   34,    58,    34,    82,    117,
      110,   116,   105,   109,   101,   46,    101,   118,   97,    108,
      117,   97,    116,   101,   34,    44,    34,    112,   97,    114,
      97,    109,   115,   34,    58,    123,   34,    101,   120,   112,
      114,   101,   115,   115,   105,   111,   110,   34,    58,    34,
      99,    111,   110,   115,   116,   32,    120,   50,    32,    61,
      32,    39,    55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      39,    34,    44,    34,    116,   104,   114,   111,   119,   79,
      110,   83,    105,   100,   101,   69,    102,   102,   101,   99,
      116,   34,    58,    116,   114,   117,   101,   44,    34,    116,
      105,   109,   101,   111,   117,   116,   34,    58,    51,    51,
      51,    125,   125};
  const char16_t *source = reinterpret_cast<const char16_t *>(buf);
  const size_t length = std::size(buf);
  for (auto &e : simdutf::get_available_implementations()) {
    if (!e->supported_by_runtime_system()) {
      continue;
    }
    std::cout << "testing " << e->name() << std::endl;

    size_t expected_size = simdutf::utf8_length_from_utf16(source, length);
    std::unique_ptr<char[]> buffer(new char[expected_size]);
    size_t out_size =
        simdutf::convert_utf16_to_utf8(source, length, buffer.get());
    std::string final_string(buffer.get(), out_size);
    std::cout << final_string << std::endl;
    printf("%zu : %zu\n", expected_size, out_size);
  }
  return 0;
}
lemire commented 12 months ago

@anonrig I expect that bumping simdutf to 3.2.18 might fix this. Note that 3.2.18 is a patch release that merely disables the icelake kernel under Visual Studio 2019. The rationale is in the release notes.

https://github.com/simdutf/simdutf/releases/tag/v3.2.18

anonrig commented 12 months ago

@lemire opened https://github.com/nodejs/node/pull/50091

lemire commented 12 months ago

@anonrig :heart:

deepak1556 commented 12 months ago

@lemire yes the hardware supports some of the AVX-512 instructions, here is the trimmed output from coreinfo utility

11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Intel64 Family 6 Model 140 Stepping 1, GenuineIntel

AVX-512-F       *       Supports AVX-512 Foundation instructions
AVX-512-DQ      *       Supports AVX-512 double and quadword instructions
AVX-512-IFAMA   *       Supports AVX-512 integer Fused multiply-add instructions
AVX-512-PF      -       Supports AVX-512 prefetch instructions
AVX-512-ER      -       Supports AVX-512 exponential and reciprocal instructions
AVX-512-CD      *       Supports AVX-512 conflict detection instructions
AVX-512-BW      *       Supports AVX-512 byte and word instructions
AVX-512-VL      *       Supports AVX-512 vector length instructions

Also, the output from the modified test file with VS2019

testing icelake
611 : 583
testing haswell
611 : 583
testing westmere
611 : 583
testing fallback
611 : 583

I was able to confirm that change from https://github.com/simdutf/simdutf/pull/328 fixes the test case and building it with node also makes the test-repl-history-navigation.js test pass again with VS2019. Thanks for the quick turnaround!

lemire commented 12 months ago

@deepak1556 Thanks. So I think we nailed it down.

@anonrig has already pushed the simdutf update to node, so the fix should be part of future releases.

It is less important right now, but my code was in error, this is the code I wanted you to try...

#include "simdutf.h"
#include <iostream>
#include <memory>
#include <uchar.h>

int main() {
  const uint16_t buf[] = {
      123,   34,    105,   100,   34,    58,    49,    44,    34,    109,
      101,   116,   104,   111,   100,   34,    58,    34,    82,    117,
      110,   116,   105,   109,   101,   46,    101,   118,   97,    108,
      117,   97,    116,   101,   34,    44,    34,    112,   97,    114,
      97,    109,   115,   34,    58,    123,   34,    101,   120,   112,
      114,   101,   115,   115,   105,   111,   110,   34,    58,    34,
      99,    111,   110,   115,   116,   32,    120,   50,    32,    61,
      32,    39,    55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341, 55357, 56341,
      39,    34,    44,    34,    116,   104,   114,   111,   119,   79,
      110,   83,    105,   100,   101,   69,    102,   102,   101,   99,
      116,   34,    58,    116,   114,   117,   101,   44,    34,    116,
      105,   109,   101,   111,   117,   116,   34,    58,    51,    51,
      51,    125,   125};
  const char16_t *source = reinterpret_cast<const char16_t *>(buf);
  const size_t length = std::size(buf);
  for (auto &e : simdutf::get_available_implementations()) {
    if (!e->supported_by_runtime_system()) {
      continue;
    }
    std::cout << "testing " << e->name() << std::endl;

    size_t expected_size = e->utf8_length_from_utf16(source, length);
    std::unique_ptr<char[]> buffer(new char[expected_size]);
    size_t out_size =
        e->convert_utf16_to_utf8(source, length, buffer.get());
    std::string final_string(buffer.get(), out_size);
    std::cout << final_string << std::endl;
    printf("%zu : %zu\n", expected_size, out_size);
  }
  return 0;
}
lemire commented 12 months ago

For people reading this, here is the gist of the story so far. In recent versions of Node, we use the library simdutf. This library will use fast code routines when it detects that the processor has some advanced instructions (AVX-512). This work well to speed up processing. Unfortunately, the code that transcodes UTF-16 characters to UTF-8 has some error that only occurs under Visual Studio 2019 (in release mode). I was unable to identify the issue exactly, but my suspicion is that there is a bug in Visual Studio 2019 with respect to AVX-512 code generation. The fix that we are pushing forward is to disable the AVX-512 code when Node is built under Visual Studio 2019. This sounds bad, but, at the same time, Node is migrating to Visual Studio 2022 so that future Windows releases will be built with Visual Studio 2022. So the Windows users that update to Node 21 will get back the AVX-512 fast paths. Special thanks to @deepak1556 for effectively going to the bottom of this mystery (seemingly so).

I anticipate that we will be closing this issue soon.

deepak1556 commented 12 months ago

Filed https://developercommunity.visualstudio.com/t/Incorrect-results-with-AVX-512-instructi/10487673 as a follow-up.