hexops / mach

zig game engine & graphics toolkit
https://machengine.org
Other
3.21k stars 154 forks source link

build: Linux x86_64 support #4

Closed slimsag closed 3 years ago

wilsonk commented 3 years ago

So this small diff will get things to build on Linux x86_64. Almost all of the tests pass 31/32. The 'setIcon' test doesn't pass, but that is only on machines that use xfce4 probably (at least I think that is the problem). This same error happens with other graphics libs that try to set the Icon (and it happens on a couple different distros, as long as they use xfce4, so I think that is the problem).

diff --git a/glfw/build.zig b/glfw/build.zig
index a0ca24b..7c495e2 100644
--- a/glfw/build.zig
+++ b/glfw/build.zig
@@ -123,7 +123,15 @@ pub fn link(b: *Builder, step: *std.build.LibExeObjStep, options: Options) void
                 var abs_path = std.fs.path.join(&arena.allocator, &.{ thisDir(), path }) catch unreachable;
                 general_sources.append(abs_path) catch unreachable;
             }
-            lib.addCSourceFiles(general_sources.items, &.{});
+
+            switch (options.linux_window_manager) {
+                .X11 => {
+                    lib.addCSourceFiles(general_sources.items, &.{"-D_GLFW_X11"});
+                },
+                .Wayland => {
+                    lib.addCSourceFiles(general_sources.items, &.{"-D_GLFW_WAYLAND"});
+                },
+            }

             switch (options.linux_window_manager) {
                 .X11 => {
@@ -189,6 +197,15 @@ fn linkGLFW(b: *Builder, step: *std.build.LibExeObjStep, options: Options) void
         else => {
             // Assume Linux-like
             // TODO(slimsag): create sdk-linux
+            switch (options.linux_window_manager) {
+                .X11 => {
+                    step.linkSystemLibrary("X11");
+                },
+                .Wayland => {
+                    step.linkSystemLibrary("wayland-client");
+                },
+            }
+            step.linkSystemLibrary("c");
         },
     }
 }
slimsag commented 3 years ago

Ahh very interesting, could you upload the failure you get for the setIcon test? I'll update the error handing there to reflect that it can fail on some platforms

The circumstances under which GLFW emits errors make me think nobody is actually handling them correctly today :) most applications just bail out. A surprising amount of time, the right thing to do is just log the error and move on..

wilsonk commented 3 years ago

Here is the test failure:

zig test src/Window.zig --test-filter "setIcon" -I ../glfw/upstream/glfw/include/ -lc -I /usr/include/ ../zig-cache/o/b66e5380b11046d90385b9d498cbc1ff/libengine.a -lX11
Test [1/1] test "setIcon"... Illegal instruction at address 0x28277d
glfw/upstream/glfw/src/x11_window.c:0:0: 0x28277d in _glfwPlatformSetWindowIcon (/home/wilsonk/Downloads/mach/glfw/upstream/glfw/src/x11_window.c)
glfw/upstream/glfw/src/window.c:511:5: 0x27b4d4 in glfwSetWindowIcon (/home/wilsonk/Downloads/mach/glfw/upstream/glfw/src/window.c)
/home/wilsonk/Downloads/mach/glfw/src/Window.zig:321:28: 0x22a32d in test "setIcon" (test)
        c.glfwSetWindowIcon(self.handle, @intCast(c_int, im.len), &tmp[0]);
                           ^
/home/wilsonk/Downloads/zig/build/lib/zig/std/special/test_runner.zig:76:28: 0x22b80c in std.special.main (test)
        } else test_fn.func();
                           ^
/home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:508:37: 0x257a17 in std.start.callMain (test)
            const result = root.main() catch |err| {
                                    ^
/home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:450:12: 0x22d777 in std.start.callMainWithArgs (test)
    return @call(.{ .modifier = .always_inline }, callMain, .{});
           ^
/home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:420:12: 0x22d522 in std.start.main (test)
    return @call(.{ .modifier = .always_inline }, callMainWithArgs, .{ @intCast(usize, c_argc), c_argv, envp });
           ^
???:?:?: 0x7faef2b91b24 in ??? (???)
error: the following test command crashed:
src/zig-cache/o/680e9ab9fe6e98081746beee14e585df/test /home/wilsonk/Downloads/zig/build/bin/zig

If you comment out line Widow.zig:643, then the error doesn't happen (again, this may be on my setup only, or an xfce4 setup). It appears to be a segfault because the incoming image to _glfwPlatformSetWindowIcon is malformed or something. There is a failure in the for loop copying the image to 'target'...so it seems to be some of that bit shifting/twiddling is the cause.

slimsag commented 3 years ago

Ahh, so it's actually an illegal instruction failure.. sounds like the image we're passing to GLFW is corrupt in some way. Will need to debug.

Also, from the Go bindings for GLFW I found there are a handful of libs we may need to link: https://github.com/go-gl/glfw/blob/master/v3.3/glfw/build.go#L29-L39

I think some of these, though, may be outdated. Will need to verify each and maybe find a way to get GitHub actions testing Wayland.

wilsonk commented 3 years ago

I haven't tried Wayland before, but I can set up a VM on my server and look at it quick here. Isaac Freund's River Wayland compositor, written in Zig, may hold some clues also (https://github.com/ifreund/river)...although it is the server side and not the client.

slimsag commented 3 years ago

Added preliminary Linux support in 7d5cd4bbcb8cca221407606068e6a398f9e11881

Next up is to package all the deps/headers into an SDK we can git clone from the Zig build script so that this works from any target (and so folks don't need to install all the X11 dependencies etc.):

zig build test -Dtarget=x86_64-linux

From a WSL Ubuntu system, it looks like we'll need the following deps:

libx11-dev libxcursor-dev libxrandr-dev libxinerama-dev libxi-dev mesa-common-dev libvulkan-dev
wilsonk commented 3 years ago

Hmm, weird. Running the 35 glfw tests on the M1 takes a couple seconds, same on Windows, but on my Linux box (with a much faster cpu than the windows box) takes 50 seconds!!? I don't understand that. Literally using 1-2% of the twelve cores, I have a decent graphics card in it (not as good as the M1 or Win box, but decent). Not sure what is going on.

GLFW v3.3.4 with string: 3.3.4 X11 GLX EGL OSMesa, so I think everything is available/working.

I had other versions of glfw installed on win/m1 so maybe the linux box is using a debug version? That is the only thing I can think of. Nope. Installed a stand-alone version of glfw in /usr/lib just in case...no change (and I understand that this shouldn't be picked up anyways...just spit balling).

Tried rebuilding everything with release-fast to no avail. Built the zig-based ZT game library which uses glfw/opengl and it runs at thousands of frames per second without vsync (60 hertz with, of course), so it isn't the opengl/glfw stuff, I guess. I do not understand what this could be. It's almost like the 'zig test'er is pausing between tests.

slimsag commented 3 years ago

The tests currently are opening and destroying a window each time, so that's 35 windows created/destroyed. That's ~1.4s per window, which doesn't seem totally outlandish. Still, it is a bit surprising there'd be such a large difference in window opening times on X11 compared to Mac/Windows.

I suspect refactoring the tests to reuse a shared window - or maybe run in parallel - would help a lot.

slimsag commented 3 years ago

I've just created a Linux SDK with all the required X11 libraries, etc. https://github.com/hexops/sdk-linux-x86_64

slimsag commented 3 years ago

@wilsonk could you see if cd glfw && zig build test -Dtarget=x86_64-linux works for you from your linux machine?

I think this is mostly done, but the CI is failing with:

Test [1/1] test "glfw_basic"... Segmentation fault at address 0x0
???:?:?: 0x7fdbace39f04 in ??? (???)
???:?:?: 0x6fab351492a403ff in ??? (???)
error: the following test command crashed:
/home/runner/work/mach/mach/zig-cache/o/bfb9af76a0c3cf02dbbe10031fdf6f14/test /usr/local/bin/zig
The following command exited with error code 1:

I think this is due to the X11 setup on GitHub actions but not 100% sure yet - it'd be great to have confirmation if this runs on a Linux box atm.

wilsonk commented 3 years ago

@slimsag Sorry I missed this message. Tried it moments ago and I get the same error here. Newest version of zig from minutes ago, Linux kernel 5.10 with an Arch based distro, in case I didn't mention it before. I am running an R9 380 card also (so radeon driver) just in case that matters. Here is a backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7ce29e5 in fwrite () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7ce29e5 in fwrite () from /usr/lib/libc.so.6
#1  0x00007ffff75e9a84 in ?? () from /usr/lib/libX11.so.6
#2  0x00007ffff75e9f9f in _XSend () from /usr/lib/libX11.so.6
#3  0x00007ffff75e0365 in XQueryExtension () from /usr/lib/libX11.so.6
#4  0x00007ffff75d3714 in XInitExtension () from /usr/lib/libX11.so.6
#5  0x00007ffff75a1eef in XextAddDisplay () from /usr/lib/libXext.so.6
#6  0x00007ffff76eb221 in XF86VidModeQueryExtension () from /usr/lib/libXxf86vm.so.1
000000000332761 in initExtensions () at upstream/glfw/src/x11_init.c:584
#8  0x00000000003324ae in _glfwPlatformInit () at upstream/glfw/src/x11_init.c:1076
#9  0x000000000031de3c in glfwInit () at upstream/glfw/src/init.c:232
#10 0x00000000002c4ec4 in init () at main.zig:47
#11 basicTest () at main.zig:125
#12 0x00000000002b8b91 in test "basic" () at main.zig:155
#13 0x00000000002c578b in std.special.main () at /home/wilsonk/Downloads/zig/build/lib/zig/std/special/test_runner.zig:76
#14 0x00000000002f1cdd in std.start.callMain () at /home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:498
#15 0x00000000002c76f8 in std.start.initEventLoopAndCallMain () at /home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:450
#16 std.start.callMainWithArgs (argc=2, argv=0x7fffffffdef8, envp=...) at /home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:405
#17 0x00000000002c74a3 in std.start.main (c_argc=2, c_argv=0x7fffffffdef8, c_envp=0x7fffffffdf10)
    at /home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:420

And part of an ltrace:

.
.
.
dlsym(0x1005ee0, "XF86VidModeQueryExtension")                         = 0x7f97aa2d0210
dlsym(0x1005ee0, "XF86VidModeGetGammaRamp")                           = 0x7f97aa2d2460
dlsym(0x1005ee0, "XF86VidModeSetGammaRamp")                           = 0x7f97aa2d2350
dlsym(0x1005ee0, "XF86VidModeGetGammaRampSize")                       = 0x7f97aa2d25d0
malloc(32)                                                            = 0xfef360
strlen("XFree86-VidModeExtension")                                    = 24
pthread_mutex_lock(0xfee6f0, 0x7f97aa2d3000, 0x7f97aa2d3000, 0)       = 0
strlen("XFree86-VidModeExtension")                                    = 24
malloc(24)                                                            = 0xfef9a0
--- SIGSEGV (Segmentation fault) ---
memcpy(0x7ffc75e620d0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 152) = 0x7ffc75e620d0
sigaction(SIGSEGV, { 0, <>, 0, 0 }, nil)                              = 0
sigaction(SIGILL, { 0, <>, 0, 0 }, nil)                               = 0
sigaction(SIGBUS, { 0, <>, 0, 0 }, nil)                               = 0
memcpy(0x7ffc75e62100, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e62100
write(2, "Segmentation fault at address 0x"..., 32Segmentation fault at address 0x)                   = 32
memcpy(0x7ffc75e620d8, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e620d8
memcpy(0x7ffc75e62040, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e62040
memcpy(0x7ffc75e61fb0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e61fb0
memcpy(0x7ffc75e61f20, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e61f20
memset(0x7ffc75e61e47, '\252', 65)                                    = 0x7ffc75e61e47
write(2, "0", 10)                                                      = 1
write(2, "\n", 1
)                                                     = 1
memset(0x7ffc75e61e68, '\252', 64)                                    = 0x7ffc75e61e68
getenv("ZIG_DEBUG_COLOR")                                             = nil
memset(0x7ffc75e61e68, '\252', 64)                                    = 0x7ffc75e61e68
2)                                                             = 1
getenv("TERM")                                                        = "xterm-256color"
memset(0x7ffc75e61d10, '\252', 8)                                     = 0x7ffc75e61d10
memset(0x7ffc75e61d18, '\252', 16)                                    = 0x7ffc75e61d18
dl_iterate_phdr(0x2d9ad0, 0x7ffc75e61b00, 0, 0)                       = 32
memset(0x7ffc75e61900, '\252', 8)                                     = 0x7ffc75e61900
mmap64(0, 4096, 3, 34)                                                = 0x7f97aabab000
memset(0x7f97aabab018, '\252', 216)                                   = 0x7f97aabab018
memset(0x7ffc75e5ea78, '\252', 4096)                                  = 0x7ffc75e5ea78
memcpy(0x7ffc75e60b00, "/usr/lib/libc.so.6\0\252\252\252\252\252\252\252\252\252\252\252\252\252"..., 4096) = 0x7ffc75e60b00
memcpy(0x7ffc75e5fb00, "/usr/lib/libc.so.6\0\252\252\252\252\252\252\252\252\252\252\252\252\252"..., 4096) = 0x7ffc75e5fb00
openat64(0xffffff9c, 0x7ffc75e5fb00, 0x80100, 0)                      = 4
memcpy(0x7ffc75e60ab8, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 144) = 0x7ffc75e60ab8
__fxstat64(1, 4, 0x7ffc75e60ab8)                                      = 0
memcpy(0x7ffc75e60d60, "\035\0\0\0\0\0\0\0\262&\0\0\0\0\0\0\001\0\0\0\0\0\0\0\355\201\0\0\0\0\0\0"..., 144) = 0x7ffc75e60d60
memcpy(0x7ffc75e60c70, "\035\0\0\0\0\0\0\0\262&\0\0\0\0\0\0\001\0\0\0\0\0\0\0\355\201\0\0\0\0\0\0"..., 144) = 0x7ffc75e60c70
mmap64(0, 0x20d018, 1, 1)                                             = 0x7f97a9f34000
close(4)                                                              = 0
write(2, "\033[1m", 4)                                                = 4
write(2, "???:?:?", 7???:?:?)                                                = 7
write(2, "\033[0m", 4)                                                = 4
write(2, ": ", 2: )                                                     = 2
write(2, "\033[2m", 4)                                                = 4
memcpy(0x7ffc75e61bf8, "\345y\214\252\227\177\0\0\a\253!\0\0\0\0\0\003\0\0\0\0\0\0\0\a\253!\0\0\0\0\0"..., 40) = 0x7ffc75e61bf8
memcpy(0x7ffc75e61b90, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e61b90
write(2, "0x", 20x)                                                     = 2
memcpy(0x7ffc75e61b68, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e61b68
memcpy(0x7ffc75e61a30, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e61a30
memcpy(0x7ffc75e619a0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e619a0
memcpy(0x7ffc75e61910, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e61910
memset(0x7ffc75e61837, '\252', 65)                                    = 0x7ffc75e61837
write(2, "7f97aa8c79e5", 127f97aa8c79e5)                                          = 12
write(2, " in ", 4 in )                                                   = 4
memcpy(0x7ffc75e61b28, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e61b28
memcpy(0x7ffc75e61a30, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e61a30
write(2, "???", 3???)                                                    = 3
write(2, " (", 2 ()                                                     = 2
memcpy(0x7ffc75e61ae8, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e61ae8
memcpy(0x7ffc75e61a30, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0x7ffc75e61a30
write(2, "???", 3???)                                                    = 3
write(2, ")", 1))                                                      = 1
write(2, "\033[0m", 4)                                                = 4
write(2, "\n", 1
)                                                     = 1
memset(0x7ffc75e61cc0, '\252', 8)                                     = 0x7ffc75e61cc0
abort( <no return ...>
--- SIGABRT (Aborted) ---
+++ killed by SIGABRT +++

And here is the relevant code section in x11_init.c:

// Look for and initialize supported X11 extensions
   //
   static GLFWbool initExtensions(void)
   {
       _glfw.x11.vidmode.handle = _glfw_dlopen("libXxf86vm.so.1");
       if (_glfw.x11.vidmode.handle)
       {
          _glfw.x11.vidmode.QueryExtension = (PFN_XF86VidModeQueryExtension)
           _glfw_dlsym(_glfw.x11.vidmode.handle, "XF86VidModeQueryExtension");
          _glfw.x11.vidmode.GetGammaRamp = (PFN_XF86VidModeGetGammaRamp)
           _glfw_dlsym(_glfw.x11.vidmode.handle, "XF86VidModeGetGammaRamp");
           _glfw.x11.vidmode.SetGammaRamp = (PFN_XF86VidModeSetGammaRamp)
           _glfw_dlsym(_glfw.x11.vidmode.handle, "XF86VidModeSetGammaRamp");
          _glfw.x11.vidmode.GetGammaRampSize = (PFN_XF86VidModeGetGammaRampSize)
           _glfw_dlsym(_glfw.x11.vidmode.handle, "XF86VidModeGetGammaRampSize");

          _glfw.x11.vidmode.available =
           XF86VidModeQueryExtension(_glfw.x11.display,
                                      &_glfw.x11.vidmode.eventBase,
                                       &_glfw.x11.vidmode.errorBase);
     }
.
.
.
wilsonk commented 3 years ago

The same segfault also happens without the -target option...just to be clear :)

wilsonk commented 3 years ago

Hmm, weird. Running the 35 glfw tests on the M1 takes a couple seconds, same on Windows, but on my Linux box (with a much faster cpu than the windows box) takes 50 seconds!!? I don't understand that. Literally using 1-2% of the twelve cores, I have a decent graphics card in it (not as good as the M1 or Win box, but decent). Not sure what is going on.

GLFW v3.3.4 with string: 3.3.4 X11 GLX EGL OSMesa, so I think everything is available/working.

I had other versions of glfw installed on win/m1 so maybe the linux box is using a debug version? That is the only thing I can think of. Nope. Installed a stand-alone version of glfw in /usr/lib just in case...no change (and I understand that this shouldn't be picked up anyways...just spit balling).

Tried rebuilding everything with release-fast to no avail. Built the zig-based ZT game library which uses glfw/opengl and it runs at thousands of frames per second without vsync (60 hertz with, of course), so it isn't the opengl/glfw stuff, I guess. I do not understand what this could be. It's almost like the 'zig test'er is pausing between tests.

Ok, so I downgraded to the 5.8 kernel under a Ubuntu variant of Linux, backed mach down to the 'initial linux support' (7d5cd4bbcb8cca221407606068e6a398f9e11881) commit and the tests run in 1.19 seconds like on Mac/Windows. So there is definitely something wrong with this system when running a new Arch based install! What a pain.

P.S. the segfault with the newest Linux support code still fails in a similar way on this older Linux install.

Test [1/1] test "glfw_basic"... 
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7540f04 in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
(gdb) bt
#0  0x00007ffff7540f04 in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
#1  0x00007ffff7541634 in ?? () from /lib/x86_64-linux-gnu/libxcb.so.1
#2  0x00007ffff7541796 in xcb_wait_for_reply64 () from /lib/x86_64-linux-gnu/libxcb.so.1
#3  0x00007ffff75b1ee8 in _XReply () from /lib/x86_64-linux-gnu/libX11.so.6
#4  0x00007ffff75a7c3c in XQueryExtension () from /lib/x86_64-linux-gnu/libX11.so.6
#5  0x00007ffff759b227 in XInitExtension () from /lib/x86_64-linux-gnu/libX11.so.6
#6  0x00007ffff756c3f1 in XextAddDisplay () from /lib/x86_64-linux-gnu/libXext.so.6
#7  0x00007ffff76d2435 in XF86VidModeQueryExtension () from /usr/lib/x86_64-linux-gnu/libXxf86vm.so.1
#8  0x000000000031f3c1 in initExtensions () at glfw/upstream/glfw/src/x11_init.c:584
#9  0x000000000031f10e in _glfwPlatformInit () at glfw/upstream/glfw/src/x11_init.c:1076
#10 0x000000000030adfc in glfwInit () at glfw/upstream/glfw/src/init.c:232
#11 0x00000000002ba574 in .glfw.init () at /home/wilsonk/mach/glfw/src/main.zig:47
#12 .glfw.basicTest () at /home/wilsonk/mach/glfw/src/main.zig:125
#13 0x00000000002b82f1 in test "glfw_basic" () at main.zig:7
#14 0x00000000002bae2d in std.special.main () at /home/wilsonk/Downloads/zig/build/lib/zig/std/special/test_runner.zig:76
#15 0x00000000002e5808 in std.start.callMain () at /home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:508
#16 0x00000000002bcd38 in std.start.initEventLoopAndCallMain () at /home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:450
#17 std.start.callMainWithArgs (argc=2, argv=0x7fffffffe608, envp=...) at /home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:405
#18 0x00000000002bcae3 in std.start.main (c_argc=2, c_argv=0x7fffffffe608, c_envp=0x7fffffffe620)
    at /home/wilsonk/Downloads/zig/build/lib/zig/std/start.zig:420
(gdb) 

Not exactly the same, but similar. A query extension problem? I can't see what it is off hand and my search engine foo isn't strong enough. Perhaps an amdgpu driver issue? Not sure what the CI would be running.

slimsag commented 3 years ago

Ok, so I downgraded to the 5.8 kernel under a Ubuntu variant of Linux, backed mach down to the 'initial linux support' (7d5cd4b) commit and the tests run in 1.19 seconds like on Mac/Windows. So there is definitely something wrong with this system when running a new Arch based install! What a pain.

That's very surprising to me. I doubt it has anything to do with kernel version, but it does make me wonder what would be different between Arch/Ubuntu that would behave that way.. I wonder if a single test (zig build test from the root of the repository, instead of in the glfw package dir) would run fairly quickly or not.

wilsonk commented 3 years ago

Yeah, sorry I wasn't suggesting that the kernel had anything to do with this being slow...just didn't want to look up what version of Ubuntu that linux was based on :) I still don't understand this problem though. Opening/closing the single test window from the root of the repo takes 1.5 seconds on average for M1 but 3 seconds for the Arch linux distro?

I paired down the example to just run c.glfwInit() and it is still twice as slow as the M1...so I guess that is the problem. I built glfw from source on both machines and ran some random examples...they ran the same on each machine...like, no delay on startup and very fast framerates, inputs, etc. So, it literally seems like there is just something about calling out to the glfw through the zig is the problem?

Ran an strace on the test file in the root directory and there are literally tens of thousands for calls to getpid() and then tens of thousands of 'resource temporarily unavailable' for recvmsg calls. Found this thread (https://github.com/kovidgoyal/kitty/issues/2754) and tried to apply the same patch to gl_context.c for the getpid() problem, but there was no real speedup.

Ok, so it doesn't seem like the getpid()'s are the real problem... the glfw examples have about 33,000 getpid() calls also, and mach's 'test' has about 44,000 calls. The difference is the 'resource temporarily unavailable' part, there is like 100,000 of them on the Arch system! Ok...so async/polling is running with the Arch system but not Ubuntu, thus producing all the 'EAGAIN's?

I don't know what the heck the difference is...frustrating. I even went so far as to create a new Manjaro VM on this machine, then run the basic test executable on the VM and it runs in 1.1 seconds just like the M1!??! UGH!! And the Manjaro system has the 33k getpid()'s and the 100k 'resource temporarily unavailable', so I guess it is something else?? UGH AGAIN!!!

Anyways, the main problem is the segfault when trying to run mach on linux...my small issue is probably machine specific, so not a big deal overall. Sorry for the spam. Any ideas on the segfault above?

Maybe someone will stumble in here and have a suggestion :)

wilsonk commented 3 years ago

I literally just coped over the static 'test' binary from the root directory of mach, where it is running in about 2.9 seconds, to the Manjaro VM (which is running ON THE SLOW MACHINE with no gpu passthrough!) and it runs in 1.1 seconds (so the same as I mentioned above, which is slightly faster than the M1).

The problem isn't with zig or how glfw is built or anything like that...it is literally this distro called Garuda (the xfce version, not wayland or anything). I love the distro, but something is messed up with it for just this one case. It runs like greased lightning, otherwise (I almost cannot stress this thing unless I am compiling LLVM or something). I checked the ulimit and nproc's, and a few things like that but it all seems fine. Weird.

Anyways, if definitely isn't anything wrong with mach! Just so you know @slimsag ;)

wilsonk commented 3 years ago

Ok, phew I got it. There was a thread about Windows being slow to init glfw because of a USB keyboard driver that needed an update...that wasn't it on Linux, of course, but it led me to look into driver problems and I stumbled across a thread that said glfwInitJoysticksLinux() at x11_init.c:1097 was a problem. So I commented out those couple lines inside the #ifdef and everything runs fine now.

Not really a solution yet, but at least I can just keep around a small diff, and apply it when updating mach, so that I can test at a reasonable speed. I will also try to actually plug in a joystick and see if that fixes things, or mess with the Garuda settings with regards to usb/joysticks and see if I can fix things that way. (I will update here if this fixes things).

I will keep looking for a solution to the original segfault in this issue also! :)

slimsag commented 3 years ago

Very interesting @wilsonk ! Thanks for the detailed write-ups on that. If we can track down the source of that slowness in glfwInitJoysticksLinux, I suspect that the upstream GLFW folks @elmindreda would appreciate the fix and/or bug report.

FWIW, I just got Ubuntu 20.04 installed on my other Macbook (which was a huge pain), and I've just:

  1. Figured out the source of the segfault - we needed to dynamically link libX11 and libxcb: https://github.com/hexops/sdk-linux-x86_64/commit/36bd27cf4d62becd7848ebc05e434688495617a0
  2. Pushed a commit to disable the Window.setIcon test for now: 68d4bb8

If you rm -rf ~/.local/share/mach/sdk-linux-x86_64 (or cd into that dir and git pull), all tests should pass now. On my Ubuntu 20.04 macbook here, they pass in 3s.

slimsag commented 3 years ago

Looks like those fixes also made the Linux tests pass on CI. :tada:

Next up is to fix that setIcon test.

wilsonk commented 3 years ago

Looks like things are all fixed here also. I think this problem with GLFW has been looked at in this issue: https://github.com/glfw/glfw/issues/1471

As I stated above, the setIcon test is only failing for certain window managers, I believe. In my case xfce4 (it fails on my Ubuntu based distro called Linux Lite and the Garuda/Arch distro).

Might want to move the setIcon discussion over to a new issue? Sorry I kind of polluted this issue with my other problems...

slimsag commented 3 years ago

Haha, no worries! Sounds good, I've created https://github.com/hexops/mach/issues/20

I think it's probably all window managers - maybe the length of the image we're passing into setIcon is wrong.

Regardless, Linux support seems solid now.