Closed trungnt2910 closed 1 year ago
How long are the tests on macOS supposed to take? I'm running the same command for macOS as for Linux and Cygwin and it seems to be hanging.
The macOS on GitHub Actions are currently x86_64
, no Apple Silicon yet.
I just tested on XNU x86 and it looks like the issue was an infinite tail call loop in pthread_jit_write_protect_np_workaround
. I'm pushing a fix presently.
error:test/libc/calls/ftruncate_test.c:59: ftruncate_doesntHaveWritePermission_einval() on fv-az365-755 pid 5304 tid 5304
ASSERT_EQ(EINVAL, EBADF/9/Bad file number)
need 22 (or 0x16 or '▬') =
got 9 (or 0x9 or '○')
EBADF/9/Bad file number
third_party/cosmo/3/ftruncate_test.com @ fv-az365-755
1 / 12 tests failed
Another error pops up on Cygwin. And reading the source code for both Cosmopolitan and Cygwin, I don't see how EBADF
could be possible, so it must be something going on with blink?
POSIX.1-2017 says that ftruncate() may raise EBADF
or EINVAL
if the file descriptor wasn't opened for writing. Cygwin chose EBADF
but Linux chose EINVAL
. I've just submitted f3bb0faf3857363badb39870fe6e9abd6918cd02 which ensures the Linux behavior always happens. Please merge and give it another try.
I force pushed and re-ran, seems like things are still hanging on macOS.
- We've now thoroughly fixed https://github.com/jart/blink/issues/26
[test] o/third_party/qemu/qemu-x86_64 -cpu core2duo o//test/asm/add.elf
malloc(): corrupted top size
I2023-01-17T07:04:16.297786:blink/blink.c:67:4566 terminating due to signal SIGABRT
o//test/asm/add.com: line 7: 4566 Aborted (core dumped) o//blink/blink o/third_party/qemu/qemu-x86_64 -cpu core2duo o//test/asm/add.elf
make: *** [test/asm/asm.mk:20: o//test/asm/add.com.ok] Error 134
GitHub Actions says otherwise.
I was referring to make check
being made to thoroughly work on Cygwin. I can see you're running make o//test/asm o//test/func
too. Those tests are part of make check2
because they run Linux GCC and Linux Qemu binaries. I'm honestly kind of shocked that emulating GCC with Blink on Cygwin is already working. However Blink emulating Qemu on non-Linux right now is a bridge too far.
Please update this change to only run make check
on non-Linux, until Qemu on non-Linux can be thoroughly debugged.
As for MacOS, from the looks of it, I suspect the issue here is that GitHub isn't provisioning a VM. I seem to recall the last time I looked into this that free MacOS VMs are basically unobtainable. Would it be possible to add FreeBSD and OpenBSD to the matrix? Since they're in the support vector and they're very similar to XNU.
I seem to recall the last time I looked into this that free MacOS VMs are basically unobtainable.
It is provisioning a VM, it just runs forever without quitting.
Please update this change to only run make check on non-Linux, until Qemu on non-Linux can be thoroughly debugged.
I've removed the o//test/asm
on non-Linux.
Would it be possible to add FreeBSD and OpenBSD to the matrix?
Yes it is, using a few third-party actions: https://github.com/vmactions
FreeBSD and OpenBSD are not supported by GitHub Actions, however it seems to be possible to run a virtual machine on the MacOS host, and then run FreeBSD code there. Would you like me to add this?
Great! The macOS build is now successful!
There seems to be a chance for socket_test.elf
to hang forever. I can confirm that on my local Cygwin instance, don't know if that same thing kept the previous macOS build from succeeding.
FreeBSD and OpenBSD are not supported by GitHub Actions, however it seems to be possible to run a virtual machine on the MacOS host, and then run FreeBSD code there. Would you like me to add this?
Sure. Sounds great.
There seems to be a chance for socket_test.elf to hang forever
Haven't encountered that one yet. If you want, you edit third_party/cosmo/cosmo.mk
and put it next to DARWIN_PROBLEMATIC_TESTS
until I have a chance to take a closer look.
Seems like all Cygwin and some macOS runs are hanging on the socket_test.elf
.
You might want to terminate these machines (they don't cost anything for open-source projects like this, but it should be done to prevent wasting free community resources...).
I've added a time limit for Cygwin and macOS preventing the make check
step from running for more than 10 minutes.
I don't understand why o//test/func
(especially lock_test
and socket_test
) just hangs Cygwin up on GitHub actions.
BSD VMs are currently also hanging (I guess for the same reason), so I'm also removing this set from the automated tests.
These tests are also known to hang on macOS, so I'm adding a measure to limit the test time to 10 minutes.
Also, note that enabling o//test/func
on the BSDs requires GNU tar, if you want to re-enable it in a future commit please try to edit the tar
usage in the Makefile
to be compatible with BSD tar, or replace it with gtar
on BSD systems.
Good news! First tick mark for OpenBSD!
Now to the bad news:
gcc
is included for o//test/func
. This is problematic, for example, detecting regressions related to #31 requires a real ELF binary.o//blink/blink third_party/cosmo/2/cosh_test.com
o//blink/blink third_party/cosmo/2/complex_test.com
o//blink/blink third_party/cosmo/2/countbits_test.com
o//blink/blink third_party/cosmo/2/cv_wait_example_test.com
cv_wait_example_test.com example_cv_wait passed 3.07s
o//blink/blink third_party/cosmo/2/daemon_test.com
o//blink/blink third_party/cosmo/2/dtoa_test.com
o//blink/blink third_party/cosmo/2/expm1_test.com
o//blink/blink third_party/cosmo/2/fgetln_test.com
o//blink/blink third_party/cosmo/2/getcontext_test.com
o//blink/blink third_party/cosmo/2/getenv_test.com
o//blink/blink third_party/cosmo/3/ftruncate_test.com
o//blink/blink third_party/cosmo/2/socket_test.com
o//blink/blink third_party/cosmo/4/unix_test.com
o//blink/blink third_party/cosmo/2/tmpfile_test.com
o//blink/blink third_party/cosmo/2/select_test.com
passed 60.34s
cv_test.com test_cv_producer_consumer1 passed 8.52s
cv_test.com test_cv_producer_consumer2 passed 951.77ms
cv_test.com test_cv_producer_consumer3 passed 57.75ms
cv_test.com test_cv_producer_consumer4 passed 54.87ms
cv_test.com test_cv_producer_consumer5 passed 141.54ms
cv_test.com test_cv_producer_consumer6 passed 49.60ms
cv_test.com test_cv_deadline passed 5.01s
cv_test.com test_cv_cancel passed 5.02s
cv_test.com test_cv_debug passed 2.08s
cv_test.com test_cv_transfer passed 647.51ms
Also, thanks for your help these last few days and sorry for the numerous workflow spam!
The end of the test output is misleading. The error actually happens earlier in the log, with clone_test.com, and then gets drowned out by the output of long running tests, which completed. I saw that same error happen on Cygwin and I'm not entirely sure why, but I think I have a fix. I'll push it shortly.
OK I think 3c4784f might solve this.
Seems like we have some piping issues:
error:test/libc/calls/pipe_test.c:57: pipe_emfile() on fv-az479-748 pid 4331 tid 4331
ASSERT_EQ(-1, pipe(f))
need -1 (or 0xffffffffffffffff) =
got 0 (or 0 or
EUNKNOWN/0/No error information
third_party/cosmo/4/pipe_test.com @ fv-az479-748
1 / 28 tests failed
error:test/libc/calls/pipe_test.c:62: pipe_emfile() on fv-az479-748 pid 4330 tid 4330
EXPECT_EQ(0, WEXITSTATUS(ws))
need 0 (or 0 or =
got 1 (or 0x1 or '☺')
EUNKNOWN/0/No error information
third_party/cosmo/4/pipe_test.com @ fv-az479-748
On which platform?
Cygwin.
As of 1278e30 should I enable o//test/func
again?
Sure give it another try if you like.
More errors on Cygwin:
error:test/libc/sock/unix_test.c:140: unix_serverGoesDown_deletedSockFile() on fv-az448-648 pid 1363 tid 1363
ASSERT_EQ(-1, write(4, "hello", 5))
need -1 (or 0xffffffffffffffff) =
got 5 (or 0x5 or '♣')
EUNKNOWN/0/No error information
third_party/cosmo/4/unix_test.com @ fv-az448-648
Also, for make check
, is there any way to replace the tar
with gtar
, as GNU tar seems to be required:
tar -C o/third_party/gcc/x86_64 -xJf third_party/gcc/x86_64-linux-musl__x86_64-linux-musl__g++-7.2.0.tar.xz
tar: unknown option -- J
usage: tar {crtux}[014578befHhjLmNOoPpqsvwXZz]
[blocking-factor | archive | replstr] [-C directory] [-I file]
[file ...]
tar {-crtux} [-014578eHhjLmNOoPpqvwXZz] [-b blocking-factor]
[-C directory] [-f archive] [-I file] [-s replstr] [file ...]
gmake: *** [third_party/gcc/gcc.mk:103: o/third_party/gcc/x86_64/bin/x86_64-linux-musl-gcc] Error 1
I ran the full Cygwin test suite locally and can confirm that the my recent change fixes things.
I've been unhappy with xz for a while and I intend to rebuild those toolchains as .tar.gz at some point. Can this problem be solved by simply disabling o//test/func
on OpenBSD? If so please do that. The main thing that needs to be tested about OpenBSD is the JIT works properly in a W^X
environment, and that's covered by make check
.
26 minutes for Cygwin... looks like a hang to me.
Do you need me to restart the job?
Yes, I do, or at least cancel it so that the logs can become visible.
Now with the logs visible, I can see the same socket_test
failure:
[test] o//blink/blink o//test/func/socket_test.elf
[test] o//blink/blink o//test/func/socket_test.elf
[test] o//blink/blink -jm o//test/func/socket_test.elf
[test] o//blink/blink -m o//test/func/socket_test.elf
[test] o//blink/blink -j o//test/func/socket_test.elf
Error: The operation was canceled.
Should I just disable o//test/func
altogether or wait for a fix?
Looks like socket_test again. I've never seen this flake in any of my testing environments locally. I recommend just turning off o//test/func
for now on Cygwin so we can make progress. Emulating GCC without JIT on an underpowered VM is probably overkill right now.
It'd also be nice if you add a status badge of the workflows on the README so that people can actually see that We regularly test that Blink is able run x86-64-linux binaries 🚀
Good suggestion. Done.
These are the two other environments supported by GitHub Actions.
Cygwin tests are currently failing. This should be blink's problem and not the Action's problem.