simdutf / simdutf

Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js, WebKit/Safari, Ladybird, Cloudflare Workers and Bun.
https://simdutf.github.io/simdutf/
Apache License 2.0
1.18k stars 75 forks source link

SIGILL on convert_utf8_to_utf16le #242

Closed dagorander closed 1 year ago

dagorander commented 1 year ago

After updating from node 18.15 to 18.16 on OpenBSD 7.3-current, I'm seeing node crashing with illegal instruction. Analysis of core dump indicates the issue happens in simdutf::icelake::implementation::convert_utf8_to_utf16le.

Issue might be specific to this hardware (misidentified CPU leading to incorrect instructions used?), as the issue does not replicate on an older gen Intel system running the same snapshot.

dmesg:

cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, 4689.19 MHz, 06-8c-01
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,PT,AVX512CD,SHA,AVX512BW,AVX512VL,AVX512VBMI,UMIP,PKU,WAITPKG,SRBDS_CTRL,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 20-way L2 cache, 12MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
cpu0: apic clock running at 38MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.1.2.1.1.1, IBE
acpicpu0 at acpi0: C3(200@1048 mwait.1@0x60), C2(200@253 mwait.1@0x31), C1(1000@1 mwait.1), PSS
cpu0: Enhanced SpeedStep 4689 MHz: speeds: 2701, 2700, 2600, 2500, 2300, 2100, 1900, 1700, 1600, 1400, 1200, 1100, 900, 700, 600, 400 MHz

Core trace:

Reading symbols from node...
(No debugging symbols found in node)
[New process 300704]
[New process 209515]
[New process 241646]
[New process 409671]
[New process 393412]
[New process 593723]

warning: .dynamic section for "/usr/lib/libc.so.97.0" is not at the expected address (wrong library or version mismatch?)
Core was generated by `node'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00000582031329b4 in simdutf::icelake::implementation::convert_utf8_to_utf16le(char const*, unsigned long, char16_t*) const ()
[Current thread is 1 (process 300704)]

GDB dissassembler dump: node_core_disassembled.txt

clausecker commented 1 year ago

Your CPU does not have AVX512VBMI2 and AVX512BITALG, which are required to run this kernel. Something must be up with our CPU capability detection routines.

This is interesting though; in theory, your CPU should have the required instructions. They don't show up in the dmesg log though.

clausecker commented 1 year ago

The disassembly is unfortunately fairly useless as your gdb is too old to support AVX-512 instructions.

dagorander commented 1 year ago

That is my bad, I accidentally used gdb command (old version 6.3 shipped with the system itself) instead of egdb. Proper dissassembly attached below. I have updated my initial report as well, and updated the core trace to the correct version of gdb as well. Apologies for that.

node_core_disassembled.txt

lemire commented 1 year ago

It is a tiger lake processor which supports AVX-512. We have a recent OpenBSD.

I am puzzled.

clausecker commented 1 year ago

This is quite interesting. The code crashes on the first AVX-512 instruction. Does perhaps OpenBSD not support AVX-512? Or is AVX-512 disabled in the BIOS?

@lemire Do you check if AVX-512 is actually supported by the OS or only if it is present in cpuid? The former can be done by checking with xgetbv if the OS has set up saving/restoring AVX-512 registers on context switch.

dagorander commented 1 year ago

For completeness, I took a dive into the UEFI on the laptop (it is a Framework), looking to see if there might be some setting somewhere that disables certain instructions. I could not find any such.

The current install began as 7.2 Release, then went to 7.2-current, then 7.3-current. So there shouldn't be much baggage in the install.

lemire commented 1 year ago

Your CPU does not have AVX512VBMI2 and AVX512BITALG

The CPU does have this support.

clausecker commented 1 year ago

@lemire I wrote that because the extensions don't show up in the dmesg output. Seems like OpenBSD didn't put the relevant bits in their dmesg parser yet.

lemire commented 1 year ago

@dagorander Can you do something for us? Make sure you have a recent cmake as well as a recent C++ compiler.

git clone https://github.com/simdutf/simdutf
cd simdutf
cmake -B build
cmake --build build
ctest   --test-dir build --verbose

What do you see?

lemire commented 1 year ago

@clausecker Is there any way to run OpenBSD on AWS?

Or to launch OpenBDS inside a Linux server?

dagorander commented 1 year ago

@lemire It looks like the system uses clang v13, so that's from '21. If that's not recent enough for this exercise, that might be an interesting data point itself, perhaps?

As a side note as well, I have sent a notice to the maintainer of the OpenBSD node package, just in case they might have input or relevant domain knowledge.

The steps you indicate result in this:


0% tests passed, 52 tests failed out of 52

Total Test time (real) =   0.47 sec

The following tests FAILED:
      1 - amalgamation_demo (ILLEGAL)
      2 - random_fuzzer (ILLEGAL)
      3 - special_tests (ILLEGAL)
      4 - validate_ascii_basic_tests (ILLEGAL)
      5 - validate_ascii_with_errors_tests (ILLEGAL)
      6 - bele_tests (ILLEGAL)
      7 - validate_utf8_basic_tests (ILLEGAL)
      8 - select_implementation (ILLEGAL)
      9 - validate_utf8_brute_force_tests (ILLEGAL)
     10 - validate_utf8_puzzler_tests (ILLEGAL)
     11 - validate_utf8_with_errors_tests (ILLEGAL)
     12 - validate_utf16le_basic_tests (ILLEGAL)
     13 - validate_utf16be_basic_tests (ILLEGAL)
     14 - validate_utf16le_with_errors_tests (ILLEGAL)
     15 - validate_utf16be_with_errors_tests (ILLEGAL)
     16 - validate_utf32_basic_tests (ILLEGAL)
     17 - validate_utf32_with_errors_tests (ILLEGAL)
     18 - convert_valid_utf8_to_utf16le_tests (ILLEGAL)
     19 - convert_valid_utf8_to_utf16be_tests (ILLEGAL)
     20 - convert_valid_utf8_to_utf32_tests (ILLEGAL)
     21 - convert_utf8_to_utf16le_tests (ILLEGAL)
     22 - convert_utf8_to_utf16be_tests (ILLEGAL)
     23 - convert_utf8_to_utf16le_with_errors_tests (ILLEGAL)
     24 - convert_utf8_to_utf16be_with_errors_tests (ILLEGAL)
     25 - convert_utf8_to_utf32_tests (ILLEGAL)
     26 - convert_utf8_to_utf32_with_errors_tests (ILLEGAL)
     27 - convert_utf16le_to_utf8_tests (ILLEGAL)
     28 - convert_utf16be_to_utf8_tests (ILLEGAL)
     29 - convert_utf16le_to_utf8_with_errors_tests (ILLEGAL)
     30 - convert_utf16be_to_utf8_with_errors_tests (ILLEGAL)
     31 - convert_utf32_to_utf8_tests (ILLEGAL)
     32 - convert_utf32_to_utf8_with_errors_tests (ILLEGAL)
     33 - convert_utf32_to_utf16le_tests (ILLEGAL)
     34 - convert_utf32_to_utf16be_tests (ILLEGAL)
     35 - convert_utf32_to_utf16le_with_errors_tests (ILLEGAL)
     36 - convert_utf32_to_utf16be_with_errors_tests (ILLEGAL)
     37 - convert_valid_utf16le_to_utf8_tests (ILLEGAL)
     38 - convert_valid_utf16be_to_utf8_tests (ILLEGAL)
     39 - convert_valid_utf32_to_utf8_tests (ILLEGAL)
     40 - convert_valid_utf32_to_utf16le_tests (ILLEGAL)
     41 - convert_valid_utf32_to_utf16be_tests (ILLEGAL)
     42 - convert_utf16le_to_utf32_tests (ILLEGAL)
     43 - convert_utf16be_to_utf32_tests (ILLEGAL)
     44 - convert_utf16le_to_utf32_with_errors_tests (ILLEGAL)
     45 - convert_utf16be_to_utf32_with_errors_tests (ILLEGAL)
     46 - convert_valid_utf16le_to_utf32_tests (ILLEGAL)
     47 - convert_valid_utf16be_to_utf32_tests (ILLEGAL)
     48 - count_utf8 (ILLEGAL)
     49 - count_utf16le (ILLEGAL)
     50 - count_utf16be (ILLEGAL)
     51 - detect_encodings_tests (ILLEGAL)
     52 - basic_fuzzer (ILLEGAL)
Errors while running CTest
Output from these tests are in: /home/daniel/Temp/simdutf/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
lemire commented 1 year ago

No. I don't think that it is case where your software is too old.

clausecker commented 1 year ago

@lemire I don't know, I don't use AWS much. It does install just fine in standard VM software; perhaps you can just run it in a Linux box under KVM. There's also this:

https://chrispinnock.com/2021/05/18/openbsd-on-aws/

I think the problem is that OpenBSD doesn't have AVX-512 support and we don't check for this.

clausecker commented 1 year ago

See this answer for how to detect if the OS supports AVX-512: https://stackoverflow.com/a/72523150/417501

lemire commented 1 year ago

@dagorander Can you do one more for me...

git clone https://github.com/easyaspi314/simdutf
cd altsimdutf
git checkout allow_nehalem
cmake -B build
cmake --build build
ctest   --test-dir build --verbose
lemire commented 1 year ago

@clausecker We have a PR from @easyaspi314 that did additional checks. It broke simdutf for some users, but it might contain the necessary fix. If so, I could bring it back minus the components that caused problems.

That's why I am asking @dagorander to run it.

dagorander commented 1 year ago

@lemire I think I only need the last bit of this run. :)

100% tests passed, 0 tests failed out of 52

Total Test time (real) = 148.59 sec

Indeed, it does seem that the PR from @easyaspi314 contains a fix for this issue.

clausecker commented 1 year ago

Here's confirmation that OpenBSD does not support AVX-512: http://cvsweb.openbsd.org/src/sys/arch/amd64/amd64/mds.S?rev=1.4&content-type=text/x-cvsweb-markup

/* we don't support AVX512 yet */
lemire commented 1 year ago

@dagorander Great. We will create a new release soon.

cc @anonrig

lemire commented 1 year ago

Patch release upcoming.

dagorander commented 1 year ago

Impressive turnaround time by literally any metric, thank you all for the effort.

(And as a personal note: this report constitutes my first contribution to a FOSS project in general and also first contribution to making OpenBSD better. So extra thanks for that. :) )

anonrig commented 1 year ago

@dagorander Great. We will create a new release soon.

cc @anonrig

Upon new version, I'll update Node.js as well.

easyaspi314 commented 1 year ago

If the only issue is macOS, then removing all the #ifdef __APPLE__ should fix things. However because of how I properly check for AVX-512, macOS will default to AVX2.

However, that will be 10000 times better than netbsd crashing.

lemire commented 1 year ago

@easyaspi314 Right. So I have brought back your code:

https://github.com/simdutf/simdutf/pull/243

As I stress in the PR, you get your signature on the commit, for fairness.

lemire commented 1 year ago

This has been released.

@dagorander You help was appreciated.

lemire commented 1 year ago

@anonrig is pushing an update to node.js.