Closed gisburn closed 4 months ago
Yeah, 11 is SIGSEGV, so mamake is somehow crashing. :-(
Unfortunately, for testing Cygwin I only have access to a 32-bit Windows 7 VM with an obsolete version of Cygwin, but it's working fine on that... so I have no real way of tracing the cause here.
A gdb stack trace might help, or perhaps you could do a 'git bisect' between the v1.0.8 and v1.0.9 tags to identify the offending commit?
Ping @JohnoKing – if you have the time, are you able to reproduce this?
I will also note that I've built ksh with AddressSanitizer (ASan) enabled many times (as shipped with Apple clang version 14.0.0 (clang-1400.0.29.202)), which also makes mamake run with it enabled... and it has not found any issues.
I will also note that I've built ksh with AddressSanitizer (ASan) enabled many times (as shipped with Apple clang version 14.0.0 (clang-1400.0.29.202)), which also makes mamake run with it enabled... and it has not found any issues.
ASAN does not find reads-from-uninitalised-memory, which can result in undefined behaviour/random crashes. I filed issue #759 (use valgrind for BUILD (not ksh93)) to finally get this mess hunted down...
BTW: 0xC0000142 is |STATUS_DLL_INIT_FAILED|, so after |fork()| the DLL init failed.
BTW: 0xC0000142 is |STATUS_DLL_INIT_FAILED|, so after |fork()| the DLL init failed.
Seems to be related to the (DLL) memory layout.
Explicitly setting the stack size seems to work around this issue (note the surrounding bash instance is required to avoid another hang when probing cc):
bash -c 'ulimit -s 2048 ; CCFLAGS="-g" SHELL=/bin/bash bash ./bin/package make SHELL=/bin/bash CCFLAGS="-g"' 2>&1 | tee buildlog.log
BTW: 0xC0000142 is |STATUS_DLL_INIT_FAILED|, so after |fork()| the DLL init failed.
Seems to be related to the (DLL) memory layout. Explicitly setting the stack size seems to work around this issue (note the surrounding bash instance is required to avoid another hang when probing cc):
bash -c 'ulimit -s 2048 ; CCFLAGS="-g" SHELL=/bin/bash bash ./bin/package make SHELL=/bin/bash CCFLAGS="-g"' 2>&1 | tee buildlog.log
Update this didn't work, it now fails after finishing libast:
+ cp mamake /home/roland_mainz/work/windows_ksh93/ksh-1.0.9/arch/cygwin.i386/bin/mamake
# ... making src/lib/libsum ...
0 [main] mamake 40713 dofork: child -1 - forked process 1232 died unexpectedly, retry 0, exit code 0xC0000142, errno 11
mamake: fork() failed: Resource temporarily unavailable
mamake: *** exit code 1 making all
[here it HANGS!! until <CTRL-C>]
I tried to vary the stack size up to $ ulimit -s 8192 #
... but not success yet... ;-((
Attached is the log of a build on Debian 11 arm64, after applying the MAMAKE_DEBUG_PREFIX patch from https://github.com/ksh93/ksh/issues/759#issuecomment-2203984213, and building with:
MAMAKE_DEBUG_PREFIX='valgrind -s --tool=memcheck --leak-check=full' CCFLAGS='-O0 -g' bin/package make
As you can see, the build succeeds and valgrind reports zero problems in mamake. The memcheck
tool is supposed to report use of initialised memory, but finds none. Can you think of other valgrind options for deeper testing?
I bypassed the |fork()| error with a debug version of the Cygwin 3.5.3 DLL (for now) and now end-up with this issue:
# src/lib/libast/Mamfile: 4980-4983: make ${INSTALLROOT}/lib/file/magic
+ cp -f /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/misc/magic.tab /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/lib/file/magic
# src/lib/libast/Mamfile: 4985-5000: make ${INSTALLROOT}/lib/package/gen/.asthdr_tstamp
+ CCt=' '
+ for src in /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/fmtmsg.h /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/libgen.h
++ basename /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/fmtmsg.h .h
+ hdr=fmtmsg
+ grep -q 'define[ ][ ]*_[hl][di][rb]_fmtmsg' ast_lib.h
+ dst=/cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/include/ast/fmtmsg.h
+ cmp -s /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/fmtmsg.h /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/include/ast/fmtmsg.h
+ cp -f /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/fmtmsg.h /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/include/ast/fmtmsg.h
+ for src in /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/fmtmsg.h /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/libgen.h
++ basename /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/libgen.h .h
+ hdr=libgen
+ grep -q 'define[ ][ ]*_[hl][di][rb]_libgen' ast_lib.h
+ continue
+ touch /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/lib/package/gen/.asthdr_tstamp
# src/lib/libast/Mamfile: 5008-5014: make mamake.o
+ compile /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/cmd/INIT/mamake.c -D_PACKAGE_ast
+ s=/cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/cmd/INIT/mamake.c
+ shift
+ cc -D_BLD_ast -Os -g -fno-strict-aliasing -I. -I/cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast -Icomp -I/cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp -D_PACKAGE_ast -Iinclude -I/cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/include -Istd -I/cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/std -c /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/cmd/INIT/mamake.c
# src/lib/libast/Mamfile: 5007-5024: make mamake
+ cc -g -fno-strict-aliasing -o mamake mamake.o libast.a -liconv
+ set +f
+ exec rm -rf /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake.exe
rm: cannot remove '/cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake': Permission denied
rm: cannot remove '/cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake.exe': Permission denied
+ cp mamake /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake
cp: cannot create regular file '/cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake.exe': Device or resource busy
mamake [lib/libast]: *** exit code 1 making mamake
mamake: *** exit code 1 making lib/libast
mamake: *** exit code 1 making all
package: make failed at Wed Jul 3 20:35:33 CEST 2024 in /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386
The problem here is that you cannot delete/move a file of a binary which is being executed.
I bypassed the |fork()| error with a debug version of the Cygwin 3.5.3 DLL (for now) and now end-up with this issue:
# src/lib/libast/Mamfile: 4980-4983: make ${INSTALLROOT}/lib/file/magic + cp -f /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/misc/magic.tab /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/lib/file/magic # src/lib/libast/Mamfile: 4985-5000: make ${INSTALLROOT}/lib/package/gen/.asthdr_tstamp + CCt=' ' + for src in /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/fmtmsg.h /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/libgen.h ++ basename /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/fmtmsg.h .h + hdr=fmtmsg + grep -q 'define[ ][ ]*_[hl][di][rb]_fmtmsg' ast_lib.h + dst=/cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/include/ast/fmtmsg.h + cmp -s /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/fmtmsg.h /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/include/ast/fmtmsg.h + cp -f /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/fmtmsg.h /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/include/ast/fmtmsg.h + for src in /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/fmtmsg.h /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/libgen.h ++ basename /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp/libgen.h .h + hdr=libgen + grep -q 'define[ ][ ]*_[hl][di][rb]_libgen' ast_lib.h + continue + touch /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/lib/package/gen/.asthdr_tstamp # src/lib/libast/Mamfile: 5008-5014: make mamake.o + compile /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/cmd/INIT/mamake.c -D_PACKAGE_ast + s=/cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/cmd/INIT/mamake.c + shift + cc -D_BLD_ast -Os -g -fno-strict-aliasing -I. -I/cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast -Icomp -I/cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/comp -D_PACKAGE_ast -Iinclude -I/cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/include -Istd -I/cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/lib/libast/std -c /cygdrive/l/builds/ksh93_109/ksh-1.0.9/src/cmd/INIT/mamake.c # src/lib/libast/Mamfile: 5007-5024: make mamake + cc -g -fno-strict-aliasing -o mamake mamake.o libast.a -liconv + set +f + exec rm -rf /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake.exe rm: cannot remove '/cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake': Permission denied rm: cannot remove '/cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake.exe': Permission denied + cp mamake /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake cp: cannot create regular file '/cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386/bin/mamake.exe': Device or resource busy mamake [lib/libast]: *** exit code 1 making mamake mamake: *** exit code 1 making lib/libast mamake: *** exit code 1 making all package: make failed at Wed Jul 3 20:35:33 CEST 2024 in /cygdrive/l/builds/ksh93_109/ksh-1.0.9/arch/cygwin.i386
The problem here is that you cannot delete/move a file of a binary which is being executed.
Interestingly I can just re-start $ ./bin/package make #, and the build then resumes until I have a finished build. That's good enough for testing, but not enough for a Cygwin package.
One possible fix might worklike this: First mamake.exe is called mamake_bootstap.exe, and a softlink links mamake.exe to mamake_bootstrap.exe, and the 2nd build of mamake.exe first removes the softlink, and then creates the new mamake.exe.
I bypassed the |fork()| error with a debug version of the Cygwin 3.5.3 DLL (for now)
Very strange. Why would that make mamake work?
By the way, mamake's use of fork(2) is here. Do you see any problems with that code? I rewrote that function to use fork(2) instead of the deprecated system(3) in eeab3e53268a9b233d22c3dee1a2fecea7ca4502, because system(3) was not working properly on Android.
The problem here is that you cannot delete/move a file of a binary which is being executed.
Good grief… This is working fine on Cygwin 3.3.5 on Windows 7. Why would they break this? It's fundamental to Unix that unlinking and deleting are not actually the same thing.
Anyway, for now you can just delete lines 5021-5043 in src/lib/libast/Mamfile, so you can test the rest of the build process. Linking mamake against libast is non-essential, it's just for completeness' sake (and it is a way of failing early if libast somehow breaks). It will work fine without.
One possible fix might worklike this: First mamake.exe is called mamake_bootstap.exe, and a softlink links mamake.exe to mamake_bootstrap.exe, and the 2nd build of mamake.exe first removes the softlink, and then creates the new mamake.exe.
This should do that:
diff --git a/bin/package b/bin/package
index 8d969c8b2..f9ba6ab66 100755
--- a/bin/package
+++ b/bin/package
@@ -2515,7 +2515,8 @@ checkaout() # cmd ...
done
fi
rm -f "$INSTALLROOT/dyn/bin/$i" "$INSTALLROOT/src/lib/libast/$i" &
- $exec $CC -O $CCFLAGS $LDFLAGS -o $INSTALLROOT/bin/$i $INITROOT/$i.c || return
+ $exec $CC -O $CCFLAGS $LDFLAGS -o $INSTALLROOT/bin/${i}_bootstrap $INITROOT/$i.c || return
+ $exec ln -s ${i}_bootstrap $INSTALLROOT/bin/$i
test -f $i.o && $exec rm -f $i.o
hash -r
;;
I bypassed the |fork()| error with a debug version of the Cygwin 3.5.3 DLL (for now)
Very strange. Why would that make mamake work?
I don't know (yet). I didn't had time for an analysis yet, I'm swamped with day-to-day work+ms-nfs41-client.
By the way, mamake's use of fork(2) is here. Do you see any problems with that code? I rewrote that function to use fork(2) instead of the deprecated system(3) in eeab3e5, because system(3) was not working properly on Android.
The use of |fork()| looks OK, but I would've used |posix_spawn()| (which is no help for Cygwin, as it uses |vfork()| to implement |posix_spawn()|, instead of the Windows |_spawn()| family of functions (see https://learn.microsoft.com/en-us/cpp/c-runtime-library/spawn-wspawn-functions?view=msvc-170).
My suggestion would be to put the |system()|-based codepath back within |#ifdef CYGWIN| for now, and maybe for UWIN later, too.
The problem here is that you cannot delete/move a file of a binary which is being executed.
Good grief… This is working fine on Cygwin 3.3.5 on Windows 7. Why would they break this? It's fundamental to Unix that unlinking and deleting are not actually the same thing.
Oh, it even gets worse, because Win10 supports the POSIX semantics, but only if the filesystem supports it and matching registry entries are set for NTFS, e.g. Google for |FILE_DISPOSITION_FLAG_POSIX_SEMANTICS|.
Anyway, for now you can just delete lines 5021-5043 in src/lib/libast/Mamfile, so you can test the rest of the build process. Linking mamake against libast is non-essential, it's just for completeness' sake (and it is a way of failing early if libast somehow breaks). It will work fine without.
AFAIK Glenn Fowler once said they did that to test libast at build time, so it has a useful purpose.
I bypassed the |fork()| error with a debug version of the Cygwin 3.5.3 DLL (for now)
Very strange. Why would that make mamake work?
I don't know (yet). I didn't had time for an analysis yet, I'm swamped with day-to-day work+ms-nfs41-client.
No worries, we all understand that life and $DAYJOB happen.
I'm glad to have someone from the old AST crowd take an interest in this revival project, and I do hope you stick around, as and when you have the time.
By the way, mamake's use of fork(2) is here. Do you see any problems with that code? I rewrote that function to use fork(2) instead of the deprecated system(3) in eeab3e5, because system(3) was not working properly on Android.
The use of |fork()| looks OK, but I would've used |posix_spawn()| (which is no help for Cygwin, as it uses |vfork()| to implement |posix_spawn()|, instead of the Windows |_spawn()| family of functions (see https://learn.microsoft.com/en-us/cpp/c-runtime-library/spawn-wspawn-functions?view=msvc-170).
posix_spawn(2) is too new for us to rely on. I'm testing ksh on multiple older systems that don't have it.
ksh does use posix_spawn where available, based on a feature test.
My suggestion would be to put the |system()|-based codepath back within |#ifdef CYGWIN| for now, and maybe for UWIN later, too.
I'd prefer to have it figured out why fork(2) isn't working and get the cause fixed. If fork is broken, then ksh is going to have problems too. Even if it can use posix_spawn to invoke programs, it must still fork for things like background jobs or ulimit
in a subshell.
As for UWIN, since we could not find any way to build it, or even to get a compiler running on the one findable binary distribution, we have never been able to test it. So I've removed the code to support it to avoid further bit rot. See d9a85caf2248a2e3221a60cc91ad03bf567b5f68. It's a shame it's dead, as it seemed a great deal nicer than cygwin.
Anyway, for now you can just delete lines 5021-5043 in src/lib/libast/Mamfile, so you can test the rest of the build process. Linking mamake against libast is non-essential, it's just for completeness' sake (and it is a way of failing early if libast somehow breaks). It will work fine without.
AFAIK Glenn Fowler once said they did that to test libast at build time, so it has a useful purpose.
Well, the libast code (for --man, etc.) was there in mamake, but it was never built, so I added the code to build it myself.
By the way, I've reworked mamake quite a bit (instead of restoring nmake, which proved unworkable). See README-mamake.md for info on the features added, such as automatic variables. I think you'll find the Mamfiles quite maintainable now.
+ $exec $CC -O $CCFLAGS $LDFLAGS -o $INSTALLROOT/bin/${i}_bootstrap $INITROOT/$i.c || return + $exec ln -s ${i}_bootstrap $INSTALLROOT/bin/$i
The patch fixes the problem. Please commit it, and close this bug as "FIXED".
Wait, it fixes the fork problem too?
Wait, it fixes the fork problem too?
Well, the DLL is no longer ETXTBUSY, so yes.
But that makes no sense: in your original report, the build was failing right at the start, it didn't even manage to make src/cmd/INIT. So replacing mamake cannot have been the cause of the failure, as mamake isn't getting replaced until after building libast. Something else must have been going wrong.
But that makes no sense: in your original report, the build was failing right at the start, it didn't even manage to make src/cmd/INIT. So replacing mamake cannot have been the cause of the failure, as mamake isn't getting replaced until after building libast. Something else must have been going wrong.
I don't know what went wrong, best guess by the Cygwin people is that this kind of error happens if you write into a DLL or EXE which is being used.
In any case, the patch fixed it.
I am now able to reproduce the build failure on Cygwin at the point where it relinks mamake against libast after building libast. While your original report (which has the failure at a much earlier stage) still doesn't make sense to me, the patch does fix the issue I am able to reproduce, so that will have to do. Thanks for the report and the testing!
ksh93 1.0.9 build crashes on Cygwin 3.5.3 like this: ---- snip ----
---- snip ----
This was working with ksh93 1.0.8... no clue yet what causes mamake/mamprove to fail like that