jart / blink

tiniest x86-64-linux emulator
ISC License
7k stars 225 forks source link

Add GitHub Actions workflows for Cygwin (Windows) and macOS #34

Closed trungnt2910 closed 1 year ago

trungnt2910 commented 1 year ago

These are the two other environments supported by GitHub Actions.

Cygwin tests are currently failing. This should be blink's problem and not the Action's problem.

trungnt2910 commented 1 year ago

How long are the tests on macOS supposed to take? I'm running the same command for macOS as for Linux and Cygwin and it seems to be hanging. The macOS on GitHub Actions are currently x86_64, no Apple Silicon yet.

jart commented 1 year ago

I just tested on XNU x86 and it looks like the issue was an infinite tail call loop in pthread_jit_write_protect_np_workaround. I'm pushing a fix presently.

trungnt2910 commented 1 year ago
error:test/libc/calls/ftruncate_test.c:59: ftruncate_doesntHaveWritePermission_einval() on fv-az365-755 pid 5304 tid 5304
    ASSERT_EQ(EINVAL, EBADF/9/Bad file number)
        need 22 (or 0x16 or '▬') =
         got 9  (or 0x9 or '○')
    EBADF/9/Bad file number
    third_party/cosmo/3/ftruncate_test.com @ fv-az365-755
1 / 12 tests failed

Another error pops up on Cygwin. And reading the source code for both Cosmopolitan and Cygwin, I don't see how EBADF could be possible, so it must be something going on with blink?

jart commented 1 year ago

POSIX.1-2017 says that ftruncate() may raise EBADF or EINVAL if the file descriptor wasn't opened for writing. Cygwin chose EBADF but Linux chose EINVAL. I've just submitted f3bb0faf3857363badb39870fe6e9abd6918cd02 which ensures the Linux behavior always happens. Please merge and give it another try.

trungnt2910 commented 1 year ago

I force pushed and re-ran, seems like things are still hanging on macOS.

trungnt2910 commented 1 year ago
[test] o/third_party/qemu/qemu-x86_64 -cpu core2duo o//test/asm/add.elf
malloc(): corrupted top size
I2023-01-17T07:04:16.297786:blink/blink.c:67:4566 terminating due to signal SIGABRT
o//test/asm/add.com: line 7:  4566 Aborted                 (core dumped) o//blink/blink o/third_party/qemu/qemu-x86_64 -cpu core2duo o//test/asm/add.elf
make: *** [test/asm/asm.mk:20: o//test/asm/add.com.ok] Error 134

GitHub Actions says otherwise.

jart commented 1 year ago

I was referring to make check being made to thoroughly work on Cygwin. I can see you're running make o//test/asm o//test/func too. Those tests are part of make check2 because they run Linux GCC and Linux Qemu binaries. I'm honestly kind of shocked that emulating GCC with Blink on Cygwin is already working. However Blink emulating Qemu on non-Linux right now is a bridge too far.

Please update this change to only run make check on non-Linux, until Qemu on non-Linux can be thoroughly debugged.

jart commented 1 year ago

As for MacOS, from the looks of it, I suspect the issue here is that GitHub isn't provisioning a VM. I seem to recall the last time I looked into this that free MacOS VMs are basically unobtainable. Would it be possible to add FreeBSD and OpenBSD to the matrix? Since they're in the support vector and they're very similar to XNU.

trungnt2910 commented 1 year ago

I seem to recall the last time I looked into this that free MacOS VMs are basically unobtainable.

It is provisioning a VM, it just runs forever without quitting.

Please update this change to only run make check on non-Linux, until Qemu on non-Linux can be thoroughly debugged.

I've removed the o//test/asm on non-Linux.

trungnt2910 commented 1 year ago

Would it be possible to add FreeBSD and OpenBSD to the matrix?

Yes it is, using a few third-party actions: https://github.com/vmactions

FreeBSD and OpenBSD are not supported by GitHub Actions, however it seems to be possible to run a virtual machine on the MacOS host, and then run FreeBSD code there. Would you like me to add this?

trungnt2910 commented 1 year ago

Great! The macOS build is now successful!

trungnt2910 commented 1 year ago

There seems to be a chance for socket_test.elf to hang forever. I can confirm that on my local Cygwin instance, don't know if that same thing kept the previous macOS build from succeeding.

jart commented 1 year ago

FreeBSD and OpenBSD are not supported by GitHub Actions, however it seems to be possible to run a virtual machine on the MacOS host, and then run FreeBSD code there. Would you like me to add this?

Sure. Sounds great.

There seems to be a chance for socket_test.elf to hang forever

Haven't encountered that one yet. If you want, you edit third_party/cosmo/cosmo.mk and put it next to DARWIN_PROBLEMATIC_TESTS until I have a chance to take a closer look.

trungnt2910 commented 1 year ago

Seems like all Cygwin and some macOS runs are hanging on the socket_test.elf.

You might want to terminate these machines (they don't cost anything for open-source projects like this, but it should be done to prevent wasting free community resources...).

I've added a time limit for Cygwin and macOS preventing the make check step from running for more than 10 minutes.

trungnt2910 commented 1 year ago

I don't understand why o//test/func (especially lock_test and socket_test) just hangs Cygwin up on GitHub actions. BSD VMs are currently also hanging (I guess for the same reason), so I'm also removing this set from the automated tests.

These tests are also known to hang on macOS, so I'm adding a measure to limit the test time to 10 minutes.

trungnt2910 commented 1 year ago

Also, note that enabling o//test/func on the BSDs requires GNU tar, if you want to re-enable it in a future commit please try to edit the tar usage in the Makefile to be compatible with BSD tar, or replace it with gtar on BSD systems.

trungnt2910 commented 1 year ago

Good news! First tick mark for OpenBSD!

Now to the bad news:

Also, thanks for your help these last few days and sorry for the numerous workflow spam!

jart commented 1 year ago

The end of the test output is misleading. The error actually happens earlier in the log, with clone_test.com, and then gets drowned out by the output of long running tests, which completed. I saw that same error happen on Cygwin and I'm not entirely sure why, but I think I have a fix. I'll push it shortly.

jart commented 1 year ago

OK I think 3c4784f might solve this.

trungnt2910 commented 1 year ago

Seems like we have some piping issues:

error:test/libc/calls/pipe_test.c:57: pipe_emfile() on fv-az479-748 pid 4331 tid 4331
    ASSERT_EQ(-1, pipe(f))
        need -1 (or 0xffffffffffffffff) =
         got 0  (or 0 or 
    EUNKNOWN/0/No error information
    third_party/cosmo/4/pipe_test.com @ fv-az479-748
1 / 28 tests failed
error:test/libc/calls/pipe_test.c:62: pipe_emfile() on fv-az479-748 pid 4330 tid 4330
    EXPECT_EQ(0, WEXITSTATUS(ws))
        need 0  (or 0 or  =
         got 1  (or 0x1 or '☺')
    EUNKNOWN/0/No error information
    third_party/cosmo/4/pipe_test.com @ fv-az479-748
jart commented 1 year ago

On which platform?

trungnt2910 commented 1 year ago

Cygwin.

trungnt2910 commented 1 year ago

As of 1278e30 should I enable o//test/func again?

jart commented 1 year ago

Sure give it another try if you like.

trungnt2910 commented 1 year ago

More errors on Cygwin:

error:test/libc/sock/unix_test.c:140: unix_serverGoesDown_deletedSockFile() on fv-az448-648 pid 1363 tid 1363
    ASSERT_EQ(-1, write(4, "hello", 5))
        need -1 (or 0xffffffffffffffff) =
         got 5  (or 0x5 or '♣')
    EUNKNOWN/0/No error information
    third_party/cosmo/4/unix_test.com @ fv-az448-648

Also, for make check, is there any way to replace the tar with gtar, as GNU tar seems to be required:

  tar -C o/third_party/gcc/x86_64 -xJf third_party/gcc/x86_64-linux-musl__x86_64-linux-musl__g++-7.2.0.tar.xz
  tar: unknown option -- J
  usage: tar {crtux}[014578befHhjLmNOoPpqsvwXZz]
             [blocking-factor | archive | replstr] [-C directory] [-I file]
             [file ...]
         tar {-crtux} [-014578eHhjLmNOoPpqvwXZz] [-b blocking-factor]
             [-C directory] [-f archive] [-I file] [-s replstr] [file ...]
  gmake: *** [third_party/gcc/gcc.mk:103: o/third_party/gcc/x86_64/bin/x86_64-linux-musl-gcc] Error 1
jart commented 1 year ago

I ran the full Cygwin test suite locally and can confirm that the my recent change fixes things.

I've been unhappy with xz for a while and I intend to rebuild those toolchains as .tar.gz at some point. Can this problem be solved by simply disabling o//test/func on OpenBSD? If so please do that. The main thing that needs to be tested about OpenBSD is the JIT works properly in a W^X environment, and that's covered by make check.

trungnt2910 commented 1 year ago

26 minutes for Cygwin... looks like a hang to me.

jart commented 1 year ago

Do you need me to restart the job?

trungnt2910 commented 1 year ago

Yes, I do, or at least cancel it so that the logs can become visible.

trungnt2910 commented 1 year ago

Now with the logs visible, I can see the same socket_test failure:

[test] o//blink/blink o//test/func/socket_test.elf
[test] o//blink/blink o//test/func/socket_test.elf
[test] o//blink/blink -jm o//test/func/socket_test.elf
[test] o//blink/blink -m o//test/func/socket_test.elf
[test] o//blink/blink -j o//test/func/socket_test.elf
Error: The operation was canceled.

Should I just disable o//test/func altogether or wait for a fix?

jart commented 1 year ago

Looks like socket_test again. I've never seen this flake in any of my testing environments locally. I recommend just turning off o//test/func for now on Cygwin so we can make progress. Emulating GCC without JIT on an underpowered VM is probably overkill right now.

trungnt2910 commented 1 year ago

It'd also be nice if you add a status badge of the workflows on the README so that people can actually see that We regularly test that Blink is able run x86-64-linux binaries 🚀

jart commented 1 year ago

Good suggestion. Done.