swaywm / sway

i3-compatible Wayland compositor
https://swaywm.org
MIT License
14.1k stars 1.08k forks source link

Vulkan renderer high cpu usage #7965

Open Swowowowo opened 4 months ago

Swowowowo commented 4 months ago

Sway Version: sway version 1.9-rc.1

Debug Log: swayvulkan.log sway.log

Configuration File: same on default and my config.

Description: CPU usage is higher when using vulkan renderer. HW specs: ryzen 5 3600, 32gb ram, rx 6700 xt. According to htop CPU% is ~15%(without screen capture) with default GLES renderer, ~50%(without screen capture) with vulkan renderer. Mouse cursor becomes laggy and unresponsive on vulkan, especially when recording.

Recorded with obs: https://github.com/swaywm/sway/assets/111225899/6aa1772f-639d-46c9-8a8f-6225dc8efde3 GLES https://github.com/swaywm/sway/assets/111225899/19190786-ae57-4a99-a450-7fb3789c43fb vulkan

bl4ckb0ne commented 4 months ago
00:00:23.248 [DEBUG] [wlr] [types/output/render.c:187] Enabling direct scan-out on output 'HDMI-A-1' (locks: 0)
00:00:23.248 [DEBUG] [wlr] [types/output/cursor.c:44] Enabling hardware cursors on output 'HDMI-A-1' (locks: 0)
00:00:23.255 [DEBUG] [wlr] [types/output/render.c:187] Disabling direct scan-out on output 'HDMI-A-1' (locks: 1)
00:00:23.255 [DEBUG] [wlr] [types/output/cursor.c:44] Disabling hardware cursors on output 'HDMI-A-1' (locks: 1)
00:00:23.297 [DEBUG] [wlr] [types/output/render.c:187] Enabling direct scan-out on output 'HDMI-A-1' (locks: 0)
00:00:23.297 [DEBUG] [wlr] [types/output/cursor.c:44] Enabling hardware cursors on output 'HDMI-A-1' (locks: 0)
00:00:23.298 [DEBUG] [wlr] [types/output/render.c:187] Disabling direct scan-out on output 'HDMI-A-1' (locks: 1)
00:00:23.298 [DEBUG] [wlr] [types/output/cursor.c:44] Disabling hardware cursors on output 'HDMI-A-1' (locks: 1)
00:00:23.884 [DEBUG] [wlr] [types/output/render.c:187] Enabling direct scan-out on output 'HDMI-A-1' (locks: 0)
00:00:23.884 [DEBUG] [wlr] [types/output/cursor.c:44] Enabling hardware cursors on output 'HDMI-A-1' (locks: 0)
00:00:23.885 [DEBUG] [wlr] [types/output/render.c:187] Disabling direct scan-out on output 'HDMI-A-1' (locks: 1)
00:00:23.885 [DEBUG] [wlr] [types/output/cursor.c:44] Disabling hardware cursors on output 'HDMI-A-1' (locks: 1)

cursor seems to be stuck in a loop, can you try to enable WLR_NO_HARDWARE_CURSORS when running the vk backend?

Swowowowo commented 4 months ago

swayvulkan_nohwcursor.log sway_nohwcursor.log Same cpu usage, low fps screen capture with vulkan. Cursor movement is better, than on vid, but still not as responsive as default renderer, feels jittery/slow on vulkan

https://github.com/swaywm/sway/assets/111225899/4b15bc30-1292-40c3-b4f6-ef744f4602b6 gles https://github.com/swaywm/sway/assets/111225899/47d3e146-9cb3-40fd-8f1d-cc27caad2411 vulkan

emersion commented 4 months ago

It's expected that the cursor is switched to software when recording.

@Swowowowo, can you build Sway with debug symbols (from git, see wiki instructions) and then try to use perf to see where the CPU time is spent?

ThGrSoRu commented 3 months ago

Joining in because I observe the same behavior using the Vulkan renderer. I ran perf and 33.76% of overhead was on the find_next_iomem_res symbol with kernel.vmlinux as shared object.

bl4ckb0ne commented 3 months ago

@ThGrSoRu can you get a complete stacktrace?

ThGrSoRu commented 3 months ago

I got this.

bl4ckb0ne commented 3 months ago

That look like a kernel issue, what version are you running? (uname -a)

ThGrSoRu commented 3 months ago

I'm running a custom compiled kernel. 6.9.0-rc1-g928a87efa423

bl4ckb0ne commented 3 months ago

Can you reproduce with the latest LTS?

ThGrSoRu commented 3 months ago

perf record -F 1000 --call-graph dwarf on Kernel 6.6.22: 6.6.22_lts.txt

emersion commented 3 months ago

Page faults maybe indicate a non-optimal buffer placement or access pattern in wlroots.

emersion commented 1 week ago

This probably helps: https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4721

ThGrSoRu commented 1 week ago

That seems to have decreased the maximum overhead from 52.38% in 6.6.22 and 32.66% in 6.9.0-rc1-g928a87efa423 to 13.53% in 6.10.0-rc5-g55027e689933-dirty and 15.67% in 6.6.35-2.1-lts. Stacktraces: 6.10.0-rc5-g55027e689933-dirty.txt 6.6.35-2.1-lts.txt This custom kernel is mainline + CachyOS 6.10 realtime patch.