michaelforney / swc

a library for making a simple Wayland compositor
MIT License
616 stars 52 forks source link

Swc example wm freezes on the newest commits #21

Closed johan-bjareholt closed 9 years ago

johan-bjareholt commented 9 years ago

So after updating swc, swc freezes my computer when starting the example wm. I don't know if the whole program freezes, but my screen freezes and i cannot change tty and cannot see anything responding. After trying out different commits and rebooting my computer a few times, this is how they work respectedly on my machine. The "partially broke" commit is a little strange, since the cursor works and it works to exit with mod+q, but nothing else than the mouse is drawn so i can still see the terminal text at the same time as the cursor which is a pretty cool artifact, but i cannot open the terminal (or any other application i would guess, since i cannot get dmenu-wl to run neither) like on the working commit. The broke commit fully freezes like i said earlier.

broke: 28f2da3b561eb03384d5bdb3b3361dd2b47f4194 partially broke: 1bd1820e59f4a9af588f9917e923346bb7d06e6a working: a33ff2c82819a30afed91d74feb9a7fde3ed9860

I could only try this out on my laptop with intel graphics, since the only other computer i have available has nvidia graphics so wayland isn't available.

I have a pull request ready for #20 that compiles, but i cannot try it out because of this.

johan-bjareholt commented 9 years ago

Any updates on this? It's the same case with velox.

johan-bjareholt commented 9 years ago

Have been running the old a33ff2c commit for a while now and it works great, i tried tackling this issue again today though.

First i tried running swc with gdb like this: swc-launch -- gdb -batch -ex run ./wm 2> gdb-log

Running on /dev/tty1
[swc:libswc/drm.c:160] DEBUG: /dev/dri/card0 is the primary GPU
# find_driver: Trying DRM driver `intel'
glamor: EGL version 1.4 (DRI2):

That wasn't very interesting

When i tried running it with valgrind like this however: swc-launch -- valgrind ./wm 2> valgrind-log

Running on /dev/tty2
==1412== Memcheck, a memory error detector
==1412== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1412== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==1412== Command: ./wm
==1412== 
[swc:libswc/drm.c:160] DEBUG: /dev/dri/card0 is the primary GPU
# find_driver: Trying DRM driver `intel'
==1412== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==1412==    at 0x5B7F330: __sendmsg_nocancel (in /usr/lib/libc-2.21.so)
==1412==    by 0x5063780: ??? (in /usr/lib/libwayland-server.so.0.1.0)
==1412==    by 0x506192E: wl_display_flush_clients (in /usr/lib/libwayland-server.so.0.1.0)
==1412==    by 0x5061987: wl_display_run (in /usr/lib/libwayland-server.so.0.1.0)
==1412==    by 0x4019BC: main (in /home/johan/Dropbox/Programming/Linux/swc/example/wm)
==1412==  Address 0xa7d757f is 4,127 bytes inside a block of size 16,424 alloc'd
==1412==    at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1412==    by 0x50638B1: ??? (in /usr/lib/libwayland-server.so.0.1.0)
==1412==    by 0x5061D12: wl_client_create (in /usr/lib/libwayland-server.so.0.1.0)
==1412==    by 0x4E4BEE8: ??? (in /usr/lib/libswc.so.0.0)
==1412==    by 0x4E48946: swc_initialize (in /usr/lib/libswc.so.0.0)
==1412==    by 0x40197F: main (in /home/johan/Dropbox/Programming/Linux/swc/example/wm)
==1412== 
glamor: EGL version 1.4 (DRI2):

Looks interesting with the uninitialized bytes, atleast it points to something.

michaelforney commented 9 years ago

If it is 28f2da3 that broke things for you, that means that swc-launch is not receiving the USR2 signal which indicates that the VT has been switched to (and therefore refuses to open any devices on behalf of the compositor).

Try adding some debugging statements in handle_usr2() in launch/launch.c to see if you can narrow things down. Is the swc-launch you are using up-to-date? Can you post some system details? Are you using systemd and systemd-logind? Maybe those are somehow messing with VT acquisition?

michaelforney commented 9 years ago

Also, to aid your debugging, I would make sure you have access to another system so you can ssh in and kill the ./wm process so you don't have to reboot all the time.

stapelberg commented 9 years ago

I’m having similar symptoms. git revision a33ff2c works for me when not using rendering nodes.

However, the git commit right after that one, i.e. 192d691e06c01ad9f99a42e6e04c0ea959a7b7ed breaks things for me. I’m seeing the contents of tty1 (where I start swc-launch ./velox), but they are frozen, i.e. the mouse cursor does not blink anymore. There is no reaction to any input.

Here’s the output of swc-launch -- strace -f -tt -o /tmp/strace.log -s2048 ./velox in both cases:

http://t.zekjur.net/st.working.bz2 (commit a33ff2c) http://t.zekjur.net/st.broken.bz2 (commit 192d691e06c01ad9f99a42e6e04c0ea959a7b7ed)

I’m testing this on a ThinkPad X200 (i.e. intel graphics card) with Debian testing.

These are the package versions of all shared libraries that velox depends on:

for lib in $(ldd ./velox | cut -d '>' -f 2 | sed 's/^\s*//g' | cut -d ' '  -f 1); do dpkg -S $lib 2>&- | cut -d ':' -f 1; done > /tmp/libs
for lib in $(sort /tmp/libs | uniq); do dpkg -l "${lib}:amd64" | tail -1; done
ii  libc6:amd64    2.19-15      amd64        GNU C Library: Shared libraries
ii  libdrm2:amd64  2.4.58-2     amd64        Userspace interface to kernel DRM services -- runtime
ii  libdrm-intel1:amd64 2.4.58-2     amd64        Userspace interface to intel-specific kernel DRM services -- runtime
ii  libdrm-nouveau2:amd64 2.4.58-2     amd64        Userspace interface to nouveau-specific kernel DRM services -- runtime
ii  libevdev2      1.3+dfsg-1   amd64        wrapper library for evdev devices
ii  libexpat1:amd64 2.1.0-6+b3   amd64        XML parsing C library - runtime library
ii  libffi6:amd64  3.1-2+b2     amd64        Foreign Function Interface library runtime
ii  libfontconfig1:amd64 2.11.0-6.3   amd64        generic font configuration library - runtime
ii  libfreetype6:amd64 2.5.2-3      amd64        FreeType 2 font engine, shared library files
ii  libinput5:amd64 0.6.0+dfsg-2 amd64        input device management and event handling library - shared library
ii  libmtdev1:amd64 1.1.5-1      amd64        Multitouch Protocol Translation Library - shared library
ii  libpciaccess0:amd64 0.13.2-3+b1  amd64        Generic PCI access library for X
ii  libpixman-1-0:amd64 0.32.6-3     amd64        pixel-manipulation library for X and cairo
ii  libpng12-0:amd64 1.2.50-2+b2  amd64        PNG library - runtime
ii  libudev1:amd64 215-12       amd64        libudev shared library
ii  libwayland-client0:amd64 1.6.0-2      amd64        wayland compositor infrastructure - client library
ii  libwayland-server0:amd64 1.6.0-2      amd64        wayland compositor infrastructure - server library
ii  libxau6:amd64  1:1.0.8-1    amd64        X11 authorisation library
ii  libxcb1:amd64  1.10-3+b1    amd64        X C Binding
ii  libxcb-composite0:amd64 1.10-3+b1    amd64        X C Binding, composite extension
ii  libxcb-icccm4:amd64 0.4.1-1      amd64        utility libraries for X C Binding -- icccm
ii  libxcb-render0:amd64 1.10-3+b1    amd64        X C Binding, render extension
ii  libxcb-shape0:amd64 1.10-3+b1    amd64        X C Binding, shape extension
ii  libxcb-xfixes0:amd64 1.10-3+b1    amd64        X C Binding, xfixes extension
ii  libxdmcp6:amd64 1:1.1.1-1+b1 amd64        X11 Display Manager Control Protocol library
ii  libxkbcommon0:amd64 0.4.3-2      amd64        library interface to the XKB compiler - shared library
ii  zlib1g:amd64   1:1.2.8.dfsg-2+b1 amd64        compression library - runtime

Let me know if you need any more information.

michaelforney commented 9 years ago

Can you check whether or not the handle_usr2 function in launch/launch.c is being called? That should be called when SIGUSR2 is sent to the launcher by kernel when the new VT gets switched to.

stapelberg commented 9 years ago

I’ve modified launch/launch.c like this:

--- i/launch/launch.c
+++ w/launch/launch.c
@@ -184,6 +184,8 @@ static void handle_usr2(int signal)
 {
     struct swc_launch_event event = { .type = SWC_LAUNCH_EVENT_ACTIVATE };

+fprintf(stderr, "handle_usr2\n");
+
     ioctl(launcher.tty_fd, VT_RELDISP, VT_ACKACQ);
     start_devices();
     send(launcher.socket, &event, sizeof event, 0);

With git revision a33ff2c, I don’t see that message in stderr. I’ve added another message to make sure I’m not doing something wrong in installing swc or printing messages, and I do see that other message. So, no, handle_usr2() is not being called.

I then did the same for git revision 192d691, and handle_usr2() isn’t called in that revision either.

Let me know if you need more information.

johan-bjareholt commented 9 years ago

Simply calling handle_usr2(SIGUSR2) instead of calling it through sigaction works great, we should probably fix sigaction instead of this dirty trick though, so we'll have to continue investigating.

michaelforney commented 9 years ago

Thanks, that really helps narrow it down.

I'm still not sure about the exact cause, but the VT mode that registers the acquire/release signals is set here, and the new VT is activated here.

The only thing I can think is maybe the new VT is the same as the old one, and the VT_ACTIVATE call doesn't trigger the acquire signal.

Could you try adding print statements for the values of vt and original_vt_state.vt in setup_tty? You should be able to debug with swc-launch -- sleep 2 without having to go to great lengths to recover your display.

michaelforney commented 9 years ago

Can you check if the launch_activate_fix branch (e7582a3cd68c3512be0f4fef93e637daed68193e) fixes your issue?

stapelberg commented 9 years ago

@michaelforney That branch indeed does fix the issue for me. Thanks!

johan-bjareholt commented 9 years ago

@michaelforney Works for me too! Would be nice if you could merge and close.