open-watcom / open-watcom-v2

Open Watcom V2.0 - Source code repository, Wiki, Latest Binary build, Archived builds including all installers for download.
Other
981 stars 159 forks source link

bwpp coredumps when bootstrapped with clang-18 #1256

Closed doehrm closed 7 months ago

doehrm commented 7 months ago

When bootstrapping with clang-18 bwpp coredumps:

$> clang --version
clang version 18.1.1 (https://github.com/llvm/llvm-project.git dba2a75e9c7ef81fe84774ba5eee5e67e01d801a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /sources/own/llvm/llvm-bin/bin

$> . ../setvars_clang.sh
Open Watcom build environment (CLANG version=18)

$> CC=clang CXX=clang++ ./build.sh -v
[...]

**** BUILD rule
+++<cdsay "/sources/watcom/open-watcom-v2/bld/cpplib">+++
============= 16:12:48 /sources/watcom/open-watcom-v2/bld/cpplib ==============
+++<pmake -d build          -h>+++
== 16:12:48 /sources/watcom/open-watcom-v2/bld/cpplib/complex/generic.086/mc ==
bwpp -zq -D_BLDVER=1300 -D_CYEAR=2024 -DNDEBUG -D_ENABLE_AUTODEPEND  -D__OBSCURE_STREAM_INTERNALS  ../../cpp/abs.cpp -fo=.obj -w8-wce=P579-j-we-zl-x-xx-zam-wpx -xr -s -oax -mc-zu -zv-0-fpc -bt=generic -I"../../../../../bld/lib_misc/h" -I"../../h" -I"../../../../../bld/plusplus/h" -I"../../../../../bld/cpplib/runtime/h" -I"../../../../../bld/hdr/dos/h" -I"../../../../../bld/watcom/h" -I"../../../../../bld/comp_cfg/h" -nm=abs -fhwe
bwpp -zq -D_BLDVER=1300 -D_CYEAR=2024 -DNDEBUG -D_ENABLE_AUTODEPEND  -D__OBSCURE_STREAM_INTERNALS  ../../cpp/acos.cpp -fo=.obj -w8-wce=P579-j-we-zl-x-xx-zam-wpx -xr -s -oax -mc-zu -zv-0-fpc -bt=generic -I"../../../../../bld/lib_misc/h" -I"../../h" -I"../../../../../bld/plusplus/h" -I"../../../../../bld/cpplib/runtime/h" -I"../../../../../bld/hdr/dos/h" -I"../../../../../bld/watcom/h" -I"../../../../../bld/comp_cfg/h" -nm=acos -fhwe
Segmentation fault
Error(E42): Last command making (acos.obj) returned a bad status
Error(E02): Make execution terminated
non-zero return: 512
Build failed
+ RC=1
+ cd /sources/watcom/open-watcom-v2
+ exit 1

With gcc-13.2.1 this problem does not occur. It's not a showstopper for me, just wanted to report it since clang is an official compiler.

jmalak commented 7 months ago

I tried to limit C standard to C99. Please try to build OW with latest source tree.

doehrm commented 7 months ago

I tried that, did a git clean -dfx and started again, it fails at the same spot:

$> git pull -v
POST git-upload-pack (155 bytes)
From https://github.com/open-watcom/open-watcom-v2
 = [up to date]            master
+++<cdsay "/sources/watcom/open-watcom-v2/bld/cpplib">+++
============= 22:00:04 /sources/watcom/open-watcom-v2/bld/cpplib ==============
+++<pmake -d build          -h>+++
== 22:00:04 /sources/watcom/open-watcom-v2/bld/cpplib/complex/generic.086/mc ==
bwpp -zq -D_BLDVER=1300 -D_CYEAR=2024 -DNDEBUG -D_ENABLE_AUTODEPEND  -D__OBSCURE_STREAM_INTERNALS  ../../cpp/abs.cpp -fo=.obj -w8-wce=P579-j-we-zl-x-xx-zam-wpx -xr -s -oax -mc-zu -zv-0-fpc -bt=generic -I"../../../../../bld/lib_misc/h" -I"../../h" -I"../../../../../bld/plusplus/h" -I"../../../../../bld/cpplib/runtime/h" -I"../../../../../bld/hdr/dos/h" -I"../../../../../bld/watcom/h" -I"../../../../../bld/comp_cfg/h" -nm=abs -fhwe
bwpp -zq -D_BLDVER=1300 -D_CYEAR=2024 -DNDEBUG -D_ENABLE_AUTODEPEND  -D__OBSCURE_STREAM_INTERNALS  ../../cpp/acos.cpp -fo=.obj -w8-wce=P579-j-we-zl-x-xx-zam-wpx -xr -s -oax -mc-zu -zv-0-fpc -bt=generic -I"../../../../../bld/lib_misc/h" -I"../../h" -I"../../../../../bld/plusplus/h" -I"../../../../../bld/cpplib/runtime/h" -I"../../../../../bld/hdr/dos/h" -I"../../../../../bld/watcom/h" -I"../../../../../bld/comp_cfg/h" -nm=acos -fhwe
Segmentation fault
Error(E42): Last command making (acos.obj) returned a bad status
Error(E02): Make execution terminated
non-zero return: 512
Build failed
+ RC=1
+ cd /sources/watcom/open-watcom-v2
+ exit 1

Is there anything else I can do to help debugging this?

jmalak commented 7 months ago

It is problem that compiler build by clang18 fails for some code. It needs debug compiler. Please, could you run in directory /sources/watcom/open-watcom-v2/bld/cpplib/complex/generic.086/mc command bwpp -zq -D_BLDVER=1300 -D_CYEAR=2024 -DNDEBUG -D_ENABLE_AUTODEPEND -D__OBSCURE_STREAM_INTERNALS ../../cpp/acos.cpp -fo=.obj -w8-wce=P579-j-we-zl-x-xx-zam-wpx -xr -s -oax -mc-zu -zv-0-fpc -bt=generic -I"../../../../../bld/lib_misc/h" -I"../../h" -I"../../../../../bld/plusplus/h" -I"../../../../../bld/cpplib/runtime/h" -I"../../../../../bld/hdr/dos/h" -I"../../../../../bld/watcom/h" -I"../../../../../bld/comp_cfg/h" -nm=acos -fhwe under debugger to get place in OW source code where compiler failed. Compiler should have minimal debug info (line numbers) that we could find critical code.

doehrm commented 7 months ago
gdb --args bwpp "-zq -D_BLDVER=1300 -D_CYEAR=2024 -DNDEBUG -D_ENABLE_AUTODEPEND -D__OBSCURE_STREAM_INTERNALS ../../cpp/acos.cpp -fo=.obj -w8-wce=P579-j-we-zl-x-xx-zam-wpx -xr -s -oax -mc-zu -zv-0-fpc -bt=generic -I"../../../../../bld/lib_misc/h" -I"../../h" -I"../../../../../bld/plusplus/h" -I"../../../../../bld/cpplib/runtime/h" -I"../../../../../bld/hdr/dos/h" -I"../../../../../bld/watcom/h" -I"../../../../../bld/comp_cfg/h" -nm=acos -fhwe"
[...]

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bwpp...
(No debugging symbols found in bwpp)

(gdb) r
Starting program: /sources/watcom/open-watcom-v2/build/binbuild/bwpp -zq\ -D_BLDVER=1300\ -D_CYEAR=2024\ -DNDEBUG\ -D_ENABLE_AUTODEPEND\ -D__OBSCURE_STREAM_INTERNALS\ ../../cpp/acos.cpp\ -fo=.obj\ -w8-wce=P579-j-we-zl-x-xx-zam-wpx\ -xr\ -s\ -oax\ -mc-zu\ -zv-0-fpc\ -bt=generic\ -I../../../../../bld/lib_misc/h\ -I../../h\ -I../../../../../bld/plusplus/h\ -I../../../../../bld/cpplib/runtime/h\ -I../../../../../bld/hdr/dos/h\ -I../../../../../bld/watcom/h\ -I../../../../../bld/comp_cfg/h\ -nm=acos\ -fhwe

Program received signal SIGSEGV, Segmentation fault.
0x000055555557cb99 in scanFunctionBody ()
(gdb) bt
#0  0x000055555557cb99 in scanFunctionBody ()
#1  0x000055555557c5d6 in MarkFuncsToGen ()
#2  0x0000555555581853 in CgBackEnd ()
#3  0x00005555555cfdc4 in doCCompile ()
#4  0x00005555555cf818 in front_end ()
#5  0x00005555555cf604 in compilePrimaryCmd ()
#6  0x00005555555cf556 in WppCompile ()
#7  0x00005555555c21b7 in IDERunYourSelfArgv ()
#8  0x0000555555618882 in IdeDrvExecDLLArgv ()
#9  0x0000555555618b41 in main ()

Is there an OW* flag that can be set to do a debug build?

jmalak commented 7 months ago

Thank you for your check, it looks like some problem in C++ compiler front-end.

you can create bootstrap compiler with full debug info manualy. Take into account there are two projects to rebuild, code generator bld/cg and C++ front-end bld/plusplus. First clean existing bootstrap build for these project by

cd bld/cg
builder bootclean
cd ../plusplus
builder bootclean

Rebuild version for debug

cd bld/cg
builder boot OWDEBUGBUILD=1
cd ../plusplus
builder boot OWDEBUGBUILD=1
doehrm commented 7 months ago

Using the debug build there is no coredump:

gdb --args bwpp "-zq -D_BLDVER=1300 -D_CYEAR=2024 -DNDEBUG -D_ENABLE_AUTODEPEND -D__OBSCURE_STREAM_INTERNALS ../../cpp/acos.cpp -fo=.obj -w8-wce=P579-j-we-zl-x-xx-zam-wpx -xr -s -oax -mc-zu -zv-0-fpc -bt=generic -I"../../../../../bld/lib_misc/h" -I"../../h" -I"../../../../../bld/plusplus/h" -I"../../../../../bld/cpplib/runtime/h" -I"../../../../../bld/hdr/dos/h" -I"../../../../../bld/watcom/h" -I"../../../../../bld/comp_cfg/h" -nm=acos -fhwe"
[...]

(gdb) r
Starting program: /sources/watcom/open-watcom-v2/build/binbuild/bwpp -zq\ -D_BLDVER=1300\ -D_CYEAR=2024\ -DNDEBUG\ -D_ENABLE_AUTODEPEND\ -D__OBSCURE_STREAM_INTERNALS\ ../../cpp/acos.cpp\ -fo=.obj\ -w8-wce=P579-j-we-zl-x-xx-zam-wpx\ -xr\ -s\ -oax\ -mc-zu\ -zv-0-fpc\ -bt=generic\ -I../../../../../bld/lib_misc/h\ -I../../h\ -I../../../../../bld/plusplus/h\ -I../../../../../bld/cpplib/runtime/h\ -I../../../../../bld/hdr/dos/h\ -I../../../../../bld/watcom/h\ -I../../../../../bld/comp_cfg/h\ -nm=acos\ -fhw

[Inferior 1 (process 2521) exited normally]
(gdb)

To verify this I built the whole thing using OWDEBUGBUILD=1 it doesn't fail at that spot but at a different one:

bwrc -q -D_BLDVER=1300 -D_CYEAR=2024  -D__DOS__ -D_M_I86 -xb -r -x -bt=windows -zku0 ../h/wasm.rc -fo=wresui.res -I. -I"../h"  -I"../../../bld/w16api/wini86/h" -I"../../../bld/watcom/h" -D_STANDALONE_
Free: NULL pointer
%write wasm.lnk debug dwarf all op nored op symfile op map sys dos   option stack=6k lib ../../../bld/wres/dosi86/ml/wres.lib
%append wasm.lnk library ../../../bld/wres/dosi86/ml/wres.lib
bwlink op q name wasm.exe @wasm.lnk
Error! E2021: size of segment _TEXT exceeds 64k by 6932 bytes
Error! E2020: size of group AUTO exceeds 64k by 6932 bytes
Current usage: 0000000000000190 bytes; Peak usage: 000000000026192c bytes; Allocations: 00002031
  Who              Addr             Size     Call     Contents
================ ================ ======== ======== ===========================
0000000000000000 000055ddcff30ae0 00000001 000000dd 00                           .
0000000000000000 000055ddcff30910 0000018f 000000dc 2f736f7572636573 2f776174636f/sources/watco
00000002 chunks (0000000000000190 bytes) unfreed
Error(E42): Last command making (wasm.exe) returned a bad status
Error(E02): Make execution terminated
non-zero return: 512
Build failed
Current usage: 0000000000007088 bytes; Peak usage: 0000000000026ceb bytes; Allocations: 00002075
  Who              Addr             Size     Call     Contents
================ ================ ======== ======== ===========================
0000000000000000 0000556f1f58ff60 00002028 00002047 3035591f6f550000 20d3571f6f5505Y.oU.. .W.oU
0000000000000000 0000556f1f593530 00002028 00002044 9088571f6f550000 00dd571f6f55..W.oU....W.oU
0000000000000000 0000556f1f578890 00002028 00000002 0000000000000000 6076571f6f55........`vW.oU
0000000000000000 0000556f1f577840 00001010 00000001 0000000000000000 2f736f757263......../sourc
00000004 chunks (0000000000007088 bytes) unfreed
+ RC=1
+ cd /sources/watcom/open-watcom-v2
+ exit 1

I guess clang-18 (18.1.1) still has some issues... clang-17 works fine.

jmalak commented 7 months ago

I think it is some memory issue with incorrect overwriting some memory part of compiler. Memory layout is different for debug version that it can behave differently. I will try to install clang-18 on my test box and try to reproduce it. As soon as I reproduce issue I will be able to debug new clang-18 build C++ compiler to identify a problem. We have similar problem with OW compiled by Visual Studio 2022 it fails but all previous VS versions are OK.

jmalak commented 7 months ago

Please, what host OS version you are using?

doehrm commented 7 months ago
$ neofetch
           .;ldkO0000Okdl;.                                                                                                                                     d
       .;d00xl:^''''''^:ok00d;.          -------------------
     .d00l'                'o00d.        OS: SUSE Linux 15 SP5 x86_64
   .d0Kd'  Okxol:;,.          :O0d.      Host: VMware7,1 None
  .OKKKK0kOKKKKKKKKKKOxo:,      lKO.     Kernel: 5.14.21-150500.55.49-default
 ,0KKKKKKKKKKKKKKKK0P^,,,^dx:    ;00,    Uptime: 16 days, 8 hours, 9 mins
.OKKKKKKKKKKKKKKKKk'.oOPPb.'0k.   cKO.   Packages: 4862 (rpm), 5 (snap)
:KKKKKKKKKKKKKKKKK: kKx..dd lKd   'OK:   Shell: bash 4.4.23
dKKKKKKKKKKKOx0KKKd ^0KKKO' kKKc   dKd   Theme: Adwaita [GTK2/3]
dKKKKKKKKKKKK;.;oOKx,..^..;kKKK0.  dKd   Icons: Adwaita [GTK2/3]
:KKKKKKKKKKKK0o;...^cdxxOK0O/^^'  .0K:   Terminal: /dev/pts/2
 kKKKKKKKKKKKKKKK0x;,,......,;od  lKk    CPU: Intel Xeon Gold 6134 (20) @ 3.192GHz
 '0KKKKKKKKKKKKKKKKKKKKK00KKOo^  c00'    GPU: 00:0f.0 VMware SVGA II Adapter
  'kKKKOxddxkOO00000Okxoc;''   .dKk'     Memory: 3952MiB / 96644MiB
    l0Ko.                    .c00l'
     'l0Kk:.              .;xK0l'
        'lkK0xl:;,,,,;:ldO0kl'
            '^:ldxkkkkxdl:^'
doehrm commented 7 months ago

I build clang-18 myself from github:

#!/bin/bash
#
export EXINC=/sources/own/llvm/external/include
export EXLIB=/sources/own/llvm/external/lib
export PATH=/sources/own/llvm/llvm-bin:$PATH
export TEMP=/backup/tmp
export TMP=$TEMP

cd llvm-project
git pull --rebase https://github.com/llvm/llvm-project.git main
mkdir ../_build
cd ../_build
#rm -rf /sources/own/llvm/llvm-bin/*

CC=clang \
CXX=clang++ \
cmake -G Ninja ../llvm-project/llvm \
        -DLLVM_ENABLE_PROJECTS='clang;clang-tools-extra;flang;compiler-rt;libc;lld;libclc;lldb;polly;openmp' \
        -DLLVM_USE_LINKER=gold \
        -DLLVM_ENABLE_RUNTIMES=all \
        -DLLVM_STATIC_LINK_CXX_STDLIB=true \
        -DCMAKE_INSTALL_PREFIX=/sources/own/llvm/llvm-bin \
        -DCMAKE_BUILD_TYPE=Release \
        -DLLVM_INSTALL_UTILS=true \
        -DLLDB_ENABLE_LIBEDIT=true \
        -DLLDB_ENABLE_LZMA=true  &&

#-DLLVM_USE_LINKER=ld \

cmake --build . --parallel 18  && cmake --install .
jmalak commented 7 months ago

I am sorry, but I am not able reproduce your problem. I installed Debian-12 with clang-18 and clang-19 from pre-build Debian packages and both build OW without any problem.

jmalak commented 7 months ago

I find mistake in my switching between clang-18 and clang-19. Now I can reproduce your problem with clang-18. clang-19 is OK. I debug it and the problem is with compiler.

doehrm commented 7 months ago

You mean the issue is with clang? Děkujeme za čas věnovaný analýze.

jmalak commented 7 months ago

It is not clear now, because it is indirect problem. It crash on accessing NULL pointer, but it is not static problem it is on dynamically allocated memory block. This pointer is NULL only on some very specific conditions now unknown. It will require lot of debugging and deap analysis to determine what is "true" source of this problem. May be time to use some memory analysis tool for compiler run-time.

LowLevelMahn commented 7 months ago

ever tried ASAN? its the gold standard in memory problem detection (no false positives by design, part of the gcc/clang compiler, much faster and exact in detection in compare to valgrind)

https://clang.llvm.org/docs/AddressSanitizer.html

compile-flags: -g -fno-omit-frame-pointer -fsanitize=address
link-flags: -fno-omit-frame-pointer -fsanitize=address
jmalak commented 7 months ago

Thanks. I will try these gcc/clang sanitizers as first. But it is not clear now if it is some hidden bug in compiler logic or if it is real memory issue.

LowLevelMahn commented 7 months ago

But it is not clear now if it is some hidden bug in compiler logic or if it is real memory issue.

ASAN will tell :) - its really a very good tool - much superior then everything else available - its a game changer

try using alway latest compiler version you can get - sanitizers are evolving

should be at best part of your CI build-tests runs

jmalak commented 7 months ago

Sanitizers are best because it generates run-time checks into generated code by compiler from type and length defined in source code with minimal overhead. Other tools need some analysis of code or add some addition code etc. and it can have significant impact to run-time code that problem can change or dismiss with these tools.

LowLevelMahn commented 7 months ago

...and ASAN tends to slowdown by factor 2-3 in max, valgrind or Intel inspector easily get 100-200 factor

doehrm commented 7 months ago

I did some mif-editing to enable ASAN:

diff --git a/build/mif/local.mif b/build/mif/local.mif
index 12ebf6433d..324da773b0 100644
--- a/build/mif/local.mif
+++ b/build/mif/local.mif
@@ -433,10 +433,10 @@ cl_extra_libs = $(cl_extra_libs_gen)
 # bld settings
 ############################
 ! ifdef __CLANG_TOOLS__
-bld_cc  = clang -pipe -c
-bld_cxx = clang++ -pipe -c
-bld_ccl = $(noecho)clang -pipe
-bld_cl  = $(noecho)clang -pipe
+bld_cc  = clang -g -fno-omit-frame-pointer -fsanitize=address -pipe -c
+bld_cxx = clang++ -g -fno-omit-frame-pointer -fsanitize=address -pipe -c
+bld_ccl = $(noecho)clang -g -fno-omit-frame-pointer -fsanitize=address -pipe
+bld_cl  = $(noecho)clang -g -fno-omit-frame-pointer -fsanitize=address -pipe
 ! else
 bld_cc  = gcc -pipe -c
 bld_cxx = g++ -pipe -c
@@ -462,8 +462,8 @@ bld_cl_extra_libs = $(bld_cl_extra_libs_gen)
 bld_cc_sys = -fno-asm -fno-common -fsigned-char
 bld_cl_sys =

-bld_c_flags   = -g -fno-omit-frame-pointer -fsanitize=address -std=gnu99
-bld_cxx_flags = -g -fno-omit-frame-pointer -fsanitize=address -std=c++98
+bld_c_flags   = -std=gnu99
+bld_cxx_flags = -std=c++98

 bld_incs = $(bld_extra_incs) -I"$(watcom_dir)/h"

@@ -514,9 +514,9 @@ bld_ldflags  = $(bld_cl_extra_libs_gen) $(bld_extra_ldflags) $(bld_ldflags_$(pro
 # standard settings
 ############################
 !  ifdef __CLANG_TOOLS__
-cc   = $(noecho)clang -pipe -c
-cxx  = $(noecho)clang++ -pipe -c
-cl   = $(noecho)clang -pipe
+cc   = $(noecho)clang -g -fno-omit-frame-pointer -fsanitize=address -pipe -c
+cxx  = $(noecho)clang++ -g -fno-omit-frame-pointer -fsanitize=address -pipe -c
+cl   = $(noecho)clang -g -fno-omit-frame-pointer -fsanitize=address -pipe
 !  else
 cc   = $(noecho)gcc -pipe -c
 cxx  = $(noecho)g++ -pipe -c
@@ -538,8 +538,8 @@ cppflags_x64    = $(common_cppflags_x64) $(common_cppflags_$(host_os)_x64)
 cppflags_arm    = $(common_cppflags_arm) $(common_cppflags_$(host_os)_arm)
 cppflags_a64    = $(common_cppflags_a64) $(common_cppflags_$(host_os)_a64)

-c_flags   = -g -fno-omit-frame-pointer -fsanitize=address -std=gnu99
-cxx_flags = -g -fno-omit-frame-pointer -fsanitize=address -std=c++98
+c_flags   = -std=gnu99
+cxx_flags = -std=c++98

 cflags_gen  = $(bld_cc_sys) $(common_cflags) $(target_cflags_$(bld_cpu)) $(common_flags) $(common_cflags_exe) -o $@
 cflags_dll  = $(bld_cc_sys) $(common_cflags) $(target_cflags_$(bld_cpu)) $(common_flags) $(common_cflags_lib_shared) -o $@

And then I recompiled using

CC=clang CXX=clang++ CFLAGS="-g -fno-omit-frame-pointer -fsanitize=address" CXXFLAGS="-g -fno-omit-frame-pointer -fsanitize=address" ./build.sh -v

Unfortunately I don't get as far as this specific error, it stops before:

========= 13:42:52 /sources/watcom/open-watcom-v2/bld/nwlib/binbuild ==========
optencod -q -rc=MSG_USAGE_WLIB_BASE -x=_W -utf8  ../optionsw.gml cmdlprsw.gh cmdlprsw.gc usagew.gh cmdlnprs.gh linux

=================================================================
==3419==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x563ffc0b7e59 in calloc /sources/own/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:75:3
    #1 0x563ffc0f92f2 in newCode /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:2141:12
    #2 0x563ffc0f92f2 in addCode /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:2160:16
    #3 0x563ffc0f92f2 in addOptionCodeSeq /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:2185:16
    #4 0x563ffc0f92f2 in genCode /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:2276:16
    #5 0x563ffc0f92f2 in outputFN_PROCESS /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:2522:15
    #6 0x563ffc0f92f2 in main /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:3430:13
    #7 0x7eff4e7cb24c in __libc_start_main (/lib64/libc.so.6+0x3524c) (BuildId: 4b30629cfedbec041523c3e9abf5e3576277533c)

Indirect leak of 1376 byte(s) in 43 object(s) allocated from:
    #0 0x563ffc0b7e59 in calloc /sources/own/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:75:3
    #1 0x563ffc0f92f2 in newCode /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:2141:12
    #2 0x563ffc0f92f2 in addCode /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:2160:16
    #3 0x563ffc0f92f2 in addOptionCodeSeq /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:2185:16
    #4 0x563ffc0f92f2 in genCode /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:2276:16
    #5 0x563ffc0f92f2 in outputFN_PROCESS /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:2522:15
    #6 0x563ffc0f92f2 in main /sources/watcom/open-watcom-v2/bld/fe_misc/binbuild/../c/optencod.c:3430:13
    #7 0x7eff4e7cb24c in __libc_start_main (/lib64/libc.so.6+0x3524c) (BuildId: 4b30629cfedbec041523c3e9abf5e3576277533c)

SUMMARY: AddressSanitizer: 1408 byte(s) leaked in 44 allocation(s).
Error(E42): Last command making (usagew.gh) returned a bad status
Error(E02): Make execution terminated
<wmake -h -f ../binmake bootstrap=1> => non-zero return: 512

According to the website it stops at the first error - does anyone if there's a way to "make -k" and not stop?

jmalak commented 7 months ago

you can use builder -i it ignore return value and continue to next project.

jmalak commented 7 months ago

Anyway you can modify make file per single project. For C++ compiler it is main make file in bld/plusplus/master.mif there you can add define additional options for 64-bit linux by extra_c_flags_linux_x64 = ... for C compiler and by extra_cxx_flags_linux_x64 = ... for C++ compiler

LowLevelMahn commented 7 months ago

use environment variable ASAN_OPTIONS=halt_on_error=0 keep ASAN from stopping - to get the build going, or better use builder -i

ASAN aborts intentionaly because leaks or missuse of memory stacking up and makes it harder to fix errors - the reason for the early abort

jmalak commented 7 months ago

you can use builder for single project and with temporary changed environment by example go to project subdirectory bld/... and run builder -i ASAN_OPTIONS=halt_on_error=0

doehrm commented 7 months ago

Apologies for all the spam here, still learning and figuring things out.

I set ASAN_OPTIONS=halt_on_error=0 and rebuilt, I could build to this point - where it eventually stops:

= 15:57:25 /sources/watcom/open-watcom-v2/bld/clib/intel/library/linux.386/mf_r =
bwcc386 -zq  -D_BLDVER=1300 -D_CYEAR=2024 -DNDEBUG -D_ENABLE_AUTODEPEND    ../../../c/chipbug.c -I"../../../h" -I"../../../../../../bld/clib/h" -I"../../../../../../bld/hdr/linux/h" -I"../../../../../../bld/watcom/h" -I"../../../../../../bld/lib_misc/h" -I"../../../../../../bld/comp_cfg/h"  -wx-wce=C310-we-j-zastd=c99-zl-x-xx-zam-wpx -s -oaxt -fo=.obj -bm -mf -fpc-5r-zc  -bt=linux  -fpi87
=================================================================
==27097==ERROR: AddressSanitizer: global-buffer-overflow on address 0x562459b6ff28 at pc 0x5624599dcba4 bp 0x7ffc8ab01b00 sp 0x7ffc8ab01af8
READ of size 4 at 0x562459b6ff28 thread T0
    #0 0x5624599dcba3 in HighReg (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x261ba3)
    #1 0x562459aabedc in ScoreCalcList (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x330edc)
    #2 0x562459aa6ba5 in ScoreRoutine (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x32bba5)
    #3 0x562459aae68c in Spawn (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x33368c)
    #4 0x562459aa6a1e in Score (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x32ba1e)
    #5 0x562459a5c988 in PostOptimize (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x2e1988)
    #6 0x562459a5a924 in Generate (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x2df924)
    #7 0x562459a36759 in BGReturn (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x2bb759)
    #8 0x562459973990 in GenOptimizedCode (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x1f8990)
    #9 0x56245996f1b4 in DoCompile (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x1f41b4)
    #10 0x5624598c565b in DoCCompile (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x14a65b)
    #11 0x5624598c4bec in FrontEnd (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x149bec)
    #12 0x562459945da3 in IDERunYourSelfArgv (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x1cada3)
    #13 0x5624599a0f0e in IdeDrvExecDLLArgv (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x225f0e)
    #14 0x5624599a179a in main (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x22679a)
    #15 0x7f767f4e724c in __libc_start_main (/lib64/libc.so.6+0x3524c) (BuildId: 4b30629cfedbec041523c3e9abf5e3576277533c)
    #16 0x5624597e5319 in _start (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x6a319)

0x562459b6ff28 is located 56 bytes before global variable 'IsSets' defined in '../../../../../bld/cg/intel/386/c/386rgtbl.c' (0x562459b6ff60) of size 56
0x562459b6ff28 is located 0 bytes after global variable 'Reg64Order' defined in '../../../../../bld/cg/intel/386/c/386rgtbl.c' (0x562459b6fee0) of size 72
SUMMARY: AddressSanitizer: global-buffer-overflow (/sources/watcom/open-watcom-v2/build/binbuild/bwcc386+0x261ba3) in HighReg
Shadow bytes around the buggy address:
  0x562459b6fc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x562459b6fd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f9
  0x562459b6fd80: f9 f9 f9 f9 07 f9 f9 f9 00 00 00 00 00 00 00 f9
  0x562459b6fe00: f9 f9 f9 f9 00 00 00 00 00 00 00 f9 f9 f9 f9 f9
  0x562459b6fe80: 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 00 00 00 00
=>0x562459b6ff00: 00 00 00 00 00[f9]f9 f9 f9 f9 f9 f9 00 00 00 00
  0x562459b6ff80: 00 00 00 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 f9
  0x562459b70000: f9 f9 f9 f9 00 00 04 f9 f9 f9 f9 f9 00 00 04 f9
  0x562459b70080: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
  0x562459b70100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x562459b70180: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redz

Is that helpful?

LowLevelMahn commented 7 months ago

Apologies for all the spam here, still learning and figuring things out.

its definitly an finding - ASAN only reports false positives if you found an Bug in ASAN itself - what is very very very rare - it can't show false positives by design (different to most of the other tools)

Warstory: Longtime projects without such tools in most of their history tend to be full of small problems like these - overflows, double-frees, stack-user-after-free, etc., ASAN never showed me an false positive in the last ~10 years

jmalak commented 7 months ago

Thanks for this info. It looks strange because this part of memory contains constant data but asan reported it as redzone (somehow overwritten). It is crucial part of compiler that if it was true then compiler cannot work. It looks like false positive info.

LowLevelMahn commented 7 months ago

It looks like false positive info.

these are super rare - but ok its an compiler - could be too much magic inside the code

jmalak commented 7 months ago

anyway I got idea to improve code, because in reported part of memory are registers class definition data which are constant. We can mark these data arrays as constant. I will try to use ASAN and probably there are some compiler options to control it. There can be also some issue that code use structures with flexible array members which has defined single element but on run-time it can have any length, it is dynamically allocated. these construct confuse analysis tools as Coverity Scan etc. It is hard to say how behaves ASAN on such construct. Anyway what is reported is simple static arrays with fixed length that it should not be issue.

One important note: Code generator has optimized memory allocator which handle all code generator memory allocations over malloc. It uses malloc for getting big block of memory and allocate thunks of this mallocated block by extra function. malloc is not used by code generator code.

LowLevelMahn commented 7 months ago

It looks strange because this part of memory contains constant data but asan reported it as redzone (somehow overwritten).

red-zones are also for read - not only write

global-buffer-overflow can happen if you, for example, read over size borders of variables/arrays - or read an int from a char variable etc.

double x[5];
int main() { 
    int rc = (int) x[5];  // Boom!
    return rc; 
}

or something like that

https://learn.microsoft.com/en-us/cpp/sanitizers/error-global-buffer-overflow?view=msvc-170

jmalak commented 7 months ago

yes, boundary check should work for both accesses. I don't believe that compiler could work if such problem really exists in this part of code. I will try to play little bit with ASAN, because the results are somehow odd or I interpreted it wrongly.

doehrm commented 7 months ago

There are more of those finding, I uploaded my complete build log, if you search for the string ERROR you will find all occurrences.

Build.log

jmalak commented 7 months ago

OW code is not memory issue free. Mainly problem is with missing memory cleanup if program exit.

LowLevelMahn commented 7 months ago

OW code is not memory issue free.

would be better to supress the leak detection with ASAN_OPTIONS=detect_leaks=0 or combine with the non-halt ASAN_OPTIONS=halt_on_error=0:detect_leaks=0

but its not usual to use any of these options at all - at least after cleaning up an project - because memory problems tends to stack up

@doehrm you seem to miss the symbolizer: llvm-symbolizer which gives you line numbers to the code instead of addresses

i don't know if your source build of clang is missing the symbolizer - the clang installation from the distros normaly contains it

maybe you need to set the env ASAN_SYMBOLIZER_PATH - but i never need to do it

the report should look more like this:

==9901==ERROR: AddressSanitizer: heap-use-after-free on address 0x60700000dfb5 at pc 0x45917b bp 0x7fff4490c700 sp 0x7fff4490c6f8
READ of size 1 at 0x60700000dfb5 thread T0
    #0 0x45917a in main use-after-free.c:5
    #1 0x7fce9f25e76c in __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226
    #2 0x459074 in _start (a.out+0x459074)
0x60700000dfb5 is located 5 bytes inside of 80-byte region [0x60700000dfb0,0x60700000e000)
freed by thread T0 here:
    #0 0x4441ee in __interceptor_free projects/compiler-rt/lib/asan/asan_malloc_linux.cc:64
    #1 0x45914a in main use-after-free.c:4
    #2 0x7fce9f25e76c in __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226
previously allocated by thread T0 here:
    #0 0x44436e in __interceptor_malloc projects/compiler-rt/lib/asan/asan_malloc_linux.cc:74
    #1 0x45913f in main use-after-free.c:3
    #2 0x7fce9f25e76c in __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226
SUMMARY: AddressSanitizer: heap-use-after-free use-after-free.c:5 main

http://www.gerryyang.com/linux%20performance/2021/10/15/address-sanitizer.html

btw: im using SUSE Tumbleweed (https://get.opensuse.org/tumbleweed/) as distro for these types of test because its an "always-up-to-date" distro with the latest versions of all component, including gcc/clang and all other libraries - very frequent whole system updates - got latest 18.1.1 clang and gcc 13.2.1 on board

other sanitizers: UBSsan (undefined behavior detector - could be a mess with open watcom :) TSAN (thread sanitizer, data-races, possible-deadlock detection) - but i don't think that OW got that many threads running and MSan (uninitialized memory,etc. - but that needs ALL third parties also build with MSan)

LowLevelMahn commented 7 months ago

It looks like false positive info.

these are super rare - but ok its an compiler - could be too much magic inside the code

better said - nearly impossible (if you didn't detect an ASAN bug) - its a huge difference that valgrind "can" give false positive by design and ASAN can't - that means a false positive is an ASAN bug, most bug reports are missing detections - complete different to TSAN which can give false positives - not preventable by design and real world ;)

jmalak commented 7 months ago

I hope it is fixed. It was a problem related to migration code for 64-bit hosts and with CGVALUE union initialization. for 32-bit version of compiler every members was initialized to 0 if any of members was initialized to 0. But for 64-bit version last member was 64-bit wide, but first member was 32-bit therefore default initialization by { 0 } zeroed only low 32-bit and high 32-bit was random It is fixed now by moving pvalue as first member and for sure I add explicit initialization of pvalue member to 0 before setup ivalue or uvalue members. It was evident C language undefined behaviour and therefore behaviour was so strange.

Please recheck it with clang-18 and other version which you have available.

LowLevelMahn commented 7 months ago

I hope it is fixed.

was it an ASAN finding or something you found by yourself - any real false positives from ASAN? (maybe i would try to reduced the scenario and file a bug for my beloved test tool :) )

jmalak commented 7 months ago

the problem with ASAN on compiler code is that it uses its own memory manager not malloc function. Therefore it is not capable to identify the memory corruption on compiler memory blocks, only fatal issue (out-of-bound on malloc-ed big block or damaged pointers). Anyway on other tools code which uses malloc it works perfect. To search this issue I used classics debugging with break/watch points and static code analysis.

winspool commented 7 months ago

It was a problem related to migration code for 64-bit hosts and with CGVALUE union initialization. for 32-bit version of compiler every members was initialized to 0 if any of members was initialized to 0. But for 64-bit version last member was 64-bit wide, but first member was 32-bit therefore default initialization by { 0 } zeroed only low 32-bit and high 32-bit was random

Nice finding.

It is fixed now by moving pvalue as first member and for sure I add explicit initialization of pvalue member to 0 before setup ivalue or uvalue members.

A simple fix, but how many union initializations with the same issue are still in the code? I wish for zero, but i have no idea.

LowLevelMahn commented 7 months ago

To search this issue I used classics debugging with break/watch points and static code analysis.

uninitalize reads can be detected with MSan (Memory sanitizer) - but also all third parties needs to be build with MSan (or else many false positives will occure) sometimes building only libc++ with MSan is enough - see the links:

https://github.com/google/sanitizers/wiki/MemorySanitizer https://github.com/google/sanitizers/wiki/MemorySanitizerLibcxxHowTo

or use Valgrind (3.22.0 is current)- even if very slow is also able to detect uninitialized reads - but can give false positives

valgrind --tool=memcheck --leak-check=no [program + parameter]

as usual for both tools - the newer - the better

i would not use valgrind for leak detection - ASAN is much better (if not blocked by compilers own memory managment) then valgrind - but the other valgrind memcheck tools could help to detect other stuff

jmalak commented 7 months ago

it is fixed now