swaywm / sway

i3-compatible Wayland compositor
https://swaywm.org
MIT License
14.67k stars 1.11k forks source link

Crash on Razer Blade 15 mid-2019 when external monitor is connected #4288

Open 2m opened 5 years ago

2m commented 5 years ago

When I connect an external monitor to my laptop, sway does not start. It crashes immediately after printing out environment variables. I am starting sway with the following env variable:

WLR_DRM_DEVICES=/dev/dri/card0:/dev/dri/card1

My laptop has two GPU cards:

─╼ l /sys/class/drm
Permissions  Size User Group Date Modified Name
lrwxrwxrwx      0 root root  27 Jun 13:37  card0-eDP-1 -> ../../devices/pci0000:00/0000:00:02.0/drm/card0/card0-eDP-1
lrwxrwxrwx      0 root root  27 Jun 13:37  card1 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1
lrwxrwxrwx      0 root root  27 Jun 13:37  card1-HDMI-A-1 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-HDMI-A-1
lrwxrwxrwx      0 root root  27 Jun 13:37  card1-DP-2 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-2
lrwxrwxrwx      0 root root  27 Jun 13:37  renderD129 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/renderD129
lrwxrwxrwx      0 root root  27 Jun 13:37  card1-DP-3 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-3
lrwxrwxrwx      0 root root  27 Jun 13:37  card0 -> ../../devices/pci0000:00/0000:00:02.0/drm/card0
.r--r--r--  4.0Ki root root  27 Jun 13:37  version
lrwxrwxrwx      0 root root  27 Jun 13:37  ttm -> ../../devices/virtual/drm/ttm
lrwxrwxrwx      0 root root  27 Jun 13:37  card1-DP-1 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-1
lrwxrwxrwx      0 root root  27 Jun 13:37  renderD128 -> ../../devices/pci0000:00/0000:00:02.0/drm/renderD128

If I swap the card0 and card1 in WLR_DRM_DEVICES then sway starts, but it is running quite slow. Mouse movement is a bit choppy on the laptop screen, but when the mouse moves to the external screen, sway becomes very unresponsive - 1 frame every couple of seconds or so.

I am running sway from master, but I also noticed the same behaviour when running latest released version.

─╼ swaymsg -t get_version
sway version 1.1-rc1-49-g5becce80 (Jun 27 2019, branch 'master')

Yesterday, with some awesome help in the #sway IRC channel, we tried to get a useful coredump, but it seems to lack interesting information. I attach it here nevertheless:

           PID: 1343 (sway)
           UID: 1000 (martynas)
           GID: 1000 (martynas)
        Signal: 7 (BUS)
     Timestamp: Thu 2019-06-27 00:52:50 EEST (12h ago)
  Command Line: build/sway/sway -d --verbose
    Executable: /home/martynas/projects/sway/build/sway/sway
 Control Group: /user.slice/user-1000.slice/session-1.scope
          Unit: session-1.scope
         Slice: user-1000.slice
       Session: 1
     Owner UID: 1000 (martynas)
       Boot ID: 0d6d43abca8141a686a2b955d28112a7
    Machine ID: 956e9ca8be864e479f94f1c2b384cd7d
      Hostname: marea
       Storage: /var/lib/systemd/coredump/core.sway.1000.0d6d43abca8141a686a2b955d28112a7.1343.1561585970000000.lz4
       Message: Process 1343 (sway) of user 1000 dumped core.

                Stack trace of thread 1359:
                #0  0x00007fda946d1387 n/a (n/a)
                #1  0x00007fda95d0ffc9 n/a (nouveau_dri.so)
                #2  0x00007fda95d16368 n/a (nouveau_dri.so)
                #3  0x00007fda95d0f66b n/a (nouveau_dri.so)
                #4  0x00007fda95d0fd2d n/a (nouveau_dri.so)
                #5  0x00007fda95d0fb38 n/a (nouveau_dri.so)
                #6  0x00007fda99bd457f start_thread (libpthread.so.0)
                #7  0x00007fda9a1cd0e3 __clone (libc.so.6)

                Stack trace of thread 1343:
                #0  0x00007fda99bda415 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007fda95d10104 n/a (nouveau_dri.so)
                #2  0x00007fda95d1a303 n/a (nouveau_dri.so)
                #3  0x00007fda95d1ae73 n/a (nouveau_dri.so)
                #4  0x00007fda961479a0 n/a (nouveau_dri.so)
                #5  0x00007fda95c6dd5e n/a (nouveau_dri.so)
                #6  0x00007fda97019f40 n/a (libEGL_mesa.so.0)
                #7  0x00007fda9700c18e n/a (libEGL_mesa.so.0)
                #8  0x00007fda9a93f18b n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #9  0x00007fda9a92dd46 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #10 0x00007fda9a92e374 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #11 0x00007fda9a92a823 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #12 0x00007fda9a97055a n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #13 0x00007fda9a970688 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #14 0x00007fda9a95ca60 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #15 0x00007fda9a97b7b5 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #16 0x00005594dc7038fe n/a (/home/martynas/projects/sway/build/sway/sway)
                #17 0x00005594dc6fbb49 n/a (/home/martynas/projects/sway/build/sway/sway)
                #18 0x00005594dc6fc385 n/a (/home/martynas/projects/sway/build/sway/sway)
                #19 0x00005594dc6f996e n/a (/home/martynas/projects/sway/build/sway/sway)
                #20 0x00007fda9a97ef94 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #21 0x00007fda9a934f23 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #22 0x00007fda9a97ef94 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #23 0x00007fda9a931628 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #24 0x00007fda9a931cdc n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #25 0x00007fda9a930b20 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #26 0x00007fda9a930d2c n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #27 0x00007fda9a926145 n/a (/home/martynas/projects/sway/buGNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
    .

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/martynas/projects/sway/build/sway/sway...
[New LWP 1359]
[New LWP 1343]
[New LWP 1348]
[New LWP 1350]
[New LWP 1349]
[New LWP 1352]
[New LWP 1353]
[New LWP 1351]
[New LWP 1354]
[New LWP 1355]
[New LWP 1357]
[New LWP 1356]
[New LWP 1358]
[New LWP 1345]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `build/sway/sway -d --verbose'.
Program terminated with signal SIGBUS, Bus error.
#0  0x00007fda946d1387 in ?? ()
[Current thread is 1 (Thread 0x7fda797fa700 (LWP 1359))]
(gdb) bt full
#0  0x00007fda946d1387 in  ()
#1  0xffffffffffffffff in  ()
#2  0xffffffff00000000 in  ()
#3  0xffffffffffffffff in  ()
#4  0xffffffffffffffff in  ()
#5  0x0000000000000000 in  ()
(gdb) quit
ild/subprojects/wlroots/libwlroots.so.3.4.1)
                #28 0x00007fda9a934b4d n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #29 0x00007fda9a926145 n/a (/home/martynas/projects/sway/build/subprojects/wlroots/libwlroots.so.3.4.1)
                #30 0x00005594dc6ebcd9 n/a (/home/martynas/projects/sway/build/sway/sway)
                #31 0x00005594dc6eb351 n/a (/home/martynas/projects/sway/build/sway/sway)
                #32 0x00007fda9a0f7ee3 __libc_start_main (libc.so.6)
                #33 0x00005594dc6dcfce n/a (/home/martynas/projects/sway/build/sway/sway)
2m commented 5 years ago

Did a bit more testing. Hopefully this will be useful.

I have noticed the following on both - the 1.1.1 and on the latest master. I was running sway with WLR_DRM_DEVICES=/dev/dri/card1:/dev/dri/card0 (nouveau first, i915 second).

The slowness depend on the resolution of the external monitor. When running the external monitor at 4k@60Hz sway is almost unusable. Mouse moves very choppy and it takes a while for any new windows to pop up (like the quit confirmation window).

However when configuring external monitor to be 1920x1080@60Hz its gets better. Mouse movement is almost fine, except it stutters for a bit periodically every one second. Also when switching from one monitor to another it takes around 5-6 seconds for the mouse to become responsive.

Nikki1993 commented 5 years ago

While not the same laptop (Lenovo X1E) it's external outputs are similarly wired through dGPU (1050ti Max Q) and I've noticed the same crash and cursor choppiness when connected to external at 4k@60hz

Most likely the slow cursor is due to dGPU stuck at lowest frequencies as nouveau has bad support for even previous gen GPUs when ran in hybrid mode, at least if the nouveau page to be believed, it would explain why performance gets better when lowering resolution.

2m commented 5 years ago

Tried with the latest nouveau drivers from 5.2.0, still the same crash without a meaningful coredump.

2m commented 5 years ago

Same results with WLR_DRM_NO_ATOMIC=1 as well.

Also noticed that even when sway successfully starts with WLR_DRM_DEVICES=/dev/dri/card0 (i915 card) and then the HDMI cable is inserted, sway crashes.

hedgepigdaniel commented 5 years ago

Most likely the slow cursor is due to dGPU stuck at lowest frequencies as nouveau has bad support for even previous gen GPUs when ran in hybrid mode, at least if the nouveau page to be believed, it would explain why performance gets better when lowering resolution.

I would have thought that merely pushing frames to an output would not require high clock rates - perhaps there is something else taking up time?

You can attach gdb to running processes (e.g. sway) with gdb and then attach <PID>. If you do that from a TTY you can pause the sway process in these slow times with p and get some sort of sample of what it's doing with bt/bt full (assuming debug symbols are present on all linked binaries, e.g. sway, wlroots, nouveau, mesa). Look up instructions for your distro on how to compile packages from source with debug symbols.

I'd love to help but I don't have any of these machines (I'm interested in buying one though). As much as I'd love to avoid Nvidia its almost impossible to find a 45W CPU on a laptop without Nvidia GPU.

Nikki1993 commented 5 years ago

@hedgepigdaniel I am a total noob when it comes to any low-level stuff so the assumption of frequency affecting the performance of the output signal is a conclusion I came into after seeing the result. That being said, thx for the tip about 'gdb' I will try to spend coming weekend debugging if I manage to get sway to properly start with dGPU in hybrid mode as it does crash atm.

hedgepigdaniel commented 5 years ago

Sure. It will probably be easier to install debug symbols and get a stacktrace for the crash you're experiencing first - that way you can read the stacktrace after the fact - so you won't have to switch TTYs like with GDB. Depending on your distro you can probably use coredumpctl (systemd) to get them.