Closed dimenus closed 1 year ago
going to put this here for posterity:
STATUS | CL.EXE | VS | CMAKE | INLINE |
---|---|---|---|---|
✓ | 19.21.27702 | 2019 16.0 | Release |
/Ob2 |
❌ | 19.22.27905 | 2019 16.2 | Release |
/Ob2 |
✓ | MinSizeRel |
/Ob1 |
||
✓ | Release |
/Ob1 (hacked) |
||
✓ | 19.23.28105.4 | 2019 16.3.0-pre.4.0 | MinSizeRel |
/Ob1 |
❌ | Release |
/Ob2 |
||
✓ | 2019 16.3.2 | MinSizeRel |
/Ob1 |
|
❌ | Release |
/Ob2 |
||
✓ | 19.24.28117.0 | 2019 16.4.0-pre.1.0 | MinSizeRel |
/Ob1 |
❌ | Release |
/Ob2 |
||
✓ | 19.24.28314.0 | 2019 16.4.2 | MinSizeRel |
/Ob1 |
❌ | Release |
/Ob2 |
||
✓ | 2019 16.5.0-pre.1.0 | MinSizeRel |
/Ob1 |
|
❌ | Release |
/Ob2 |
||
✓ | 19.24.28316.0 | 2019 16.4.4 | Release |
/Ob2 |
✓ | 19.25.28508.3 | 2019 16.5.0-pre.2.0 | Release |
/Ob2 |
Can it be related to the optimizations? In radare2 we met with memmove optimization bug: https://developercommunity.visualstudio.com/content/problem/583227/vs-2019-cl-1921277022-memmove-instrinsic-optimizat.html
Based on @XVilka's link, that could very well be our issue. Microsoft has acknowledged the bug and so eventually Azure will be updated with a patched MSVC. Until then I am testing the MinSizeRel workaround.
The CI is using MinSizeRel to work around this issue. Thanks @mikdusan.
I'm not actually able to reproduce it locally. Let's keep this issue open until the fix is widely available.
I investigated this with
$ cl.exe
Microsoft (R) C/C++ Optimizing Compiler Version 19.23.28105.4 for x64
at 17f2af10b56dd508ebd4a22834fd594cb5a49864
.
The root cause is that building the global builtin function table corrupts memory while adding entries. They will then later not be available when trying to access them.
The bug only occurs with CMake's Release
target, so in order to get debug symbols I had to hack a bit using https://gitlab.kitware.com/cmake/community/wikis/FAQ#make-override-files. In my case I used a RelWithDebInfo
and overriding flags to be sure to have /O2 /Ob2
.
set(CMAKE_USER_MAKE_RULES_OVERRIDE_CXX
${CMAKE_CURRENT_SOURCE_DIR}/cxx_flag_overrides.cmake)
if(MSVC)
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO_INIT "/MD /Zi /O2 /Ob2 /D NDEBUG")
endif()
Building target libuserland
(with the command zig0.exe build --override-lib-dir <path>/<to>/zig/lib libuserland install -Doutput-dir=<path>/<to>/zig/build -Drelease=true -Dlib-files-only --prefix <path>/<to>/Install/zig
) will then fail.
So I launched it via the debugger. Now, while running define_builtin_fns
from codegen.cpp
, we can observe g->builtin_fn_table._entries
being corrupted (it's a sparse array in memory) when executing create_builtin_fn(..., "errorReturnTrace");
.
The disasm is
create_builtin_fn(g, BuiltinFnIdErrorReturnTrace, "errorReturnTrace", 0);
00007FF625EE5219 lea edx,[r14+28h]
00007FF625EE521D lea ecx,[rdx-27h]
00007FF625EE5220 call qword ptr [__imp_calloc (07FF62C995030h)]
00007FF625EE5226 mov rsi,rax
00007FF625EE5229 test rax,rax
00007FF625EE522C je define_builtin_fns+2999h (07FF625EE5B59h)
00007FF625EE5232 lea rbx,[rax+8]
00007FF625EE5236 mov qword ptr [rbp+28h],rax
00007FF625EE523A mov rcx,rbx
00007FF625EE523D lea r8d,[r14+10h]
00007FF625EE5241 lea rdx,[string "errorReturnTrace" (07FF62A907A28h)]
00007FF625EE5248 call buf_init_from_mem (07FF625EDCE10h)
00007FF625EE524D lea r8,[rbp+28h]
00007FF625EE5251 mov qword ptr [rbp+30h],rbx
00007FF625EE5255 lea rdx,[rbp+30h]
00007FF625EE5259 mov rcx,rdi
00007FF625EE525C mov dword ptr [rax],65h
00007FF625EE5262 mov qword ptr [rax+20h],r14
00007FF625EE5266 call HashMap<Buf * __ptr64,BuiltinFnEntry * __ptr64,&buf_hash,&buf_eql_buf>::put (07FF625DA17F2h)
This is basically create_builtin_fn
inlined, see we're creating the builtin entry, its name and inserting it into the hashmap.
Near the end, at mov dword ptr [rax],65h
the memory pointed to by builtin_fn->name.list.items
(rax
) is stomped and so the key for the hashmap is garbage.
If you look at the asm generated for the previous builtins, the part after the call to buf_init_from_mem
is nearly identical, and then changes slightly starting from "errorReturnTrace"
.
I'm pretty sure this is a codegen bug, but maybe not the one linked above. An easy workaround is to surround one of these two functions with #pragma optimize("", off/on)
and the bug will disappear. I guess splitting defined_builtin_fns
in two might also help. Anyways we should probably file a bug report to Microsoft.
Tested this again with
set(CMAKE_CXX_FLAGS_RELEASE_INIT
"/MT /O2 /Ob2 /D NDEBUG")
and
set(CMAKE_CXX_FLAGS_RELEASE_INIT
"/MT /O2 /Ob3 /D NDEBUG")
(Ob3 is a new flag in VS 2019) on compilers 19.24.28314
and 19.24.28316
without walking into the issue anymore. Maybe we can close this (I'll make a PR to reset the optimization flags to default) ?
@Sahnvour,
compilers
19.24.28314
and19.24.28316
Something odd here. At my last update I tested 19.24.28314
and the issue remained. This was building against the same LLVM binary bundle used by CI.
I speculated that maybe this issue can be resolved by rebuilding LLVM with new compiler. Unfortunately I ran into difficulties rebuilding LLVM on my system and left that unresolved.
Are you using your own LLVM dependency?
update: 19.24.28316.0
and 19.25.28508.3
work for me with Release
and /Ob2
so assuming CI is running minimal version we should be fine going back to /Ob2
.
This issue no longer plagues us since we use the wasm kernel for building from source rather than C++ code.
Related to #2740
With the upgrade from 19.21 (VS 2017) to 19.22, Microsoft seems to have broken something related to our FNV hash implementation. Building libuserland leads to this error:
invalid builtin function: 'hasDecl'
The workaround is to use VS 2017 or build tools older than 19.22.