libretro / RetroArch

Cross-platform, sophisticated frontend for the libretro API. Licensed GPLv3.
http://www.libretro.com
GNU General Public License v3.0
10.39k stars 1.84k forks source link

vulkan/radv: memory leak and black screen with shaders enabled #10939

Closed AaronBPaden closed 4 years ago

AaronBPaden commented 4 years ago

Description

With shaders enabled, the vulkan backend will use up all system memory, eventually triggering the oom killer. Games do not load at all and instead present a black screen.

Steps to reproduce the bug

Set up shaders with the vulkan backend and load a game. this behavior was tested with mesen running ntsc-royale and sameboy running dmg-4x shaders.

Bisect Results

First commit with this issue was 72d1a313aee0a30bd16b37a2dcf6434f2bf3b8f5

Version/Commit

You can find this information under Information/System Information

ed71d91c77

Environment information

AaronBPaden commented 4 years ago

Sorry, meant to upload a log but I forgot.

[INFO] RetroArch 1.8.9 (Git ed71d91c77)
[INFO] === Build =======================================
[INFO] CPU Model Name: AMD Ryzen 7 2700X Eight-Core Processor         
[INFO] Capabilities:  MMX MMXEXT SSE SSE2 SSE3 SSSE3 SSE4 SSE4.2 AES AVX AVX2
[INFO] Built: Jun 28 2020
[INFO] Version: 1.8.9
[INFO] Git: ed71d91c77
[INFO] =================================================
[INFO] [Environ]: SET_PIXEL_FORMAT: RGB565.
[INFO] [Overrides]: Redirecting save file to "/home/aaron/.config/retroarch/saves/.srm".
[INFO] [Overrides]: Redirecting save state to "/home/aaron/.config/retroarch/states/.state".
[INFO] Version of libretro API: 1
[INFO] Compiled against API: 1
[INFO] [Audio]: Set audio input rate to: 48000.00 Hz.
[INFO] [Video]: Video @ fullscreen
[ERROR] [Wayland]: Failed to connect to Wayland server.
[INFO] [Vulkan]: Vulkan dynamic library loaded.
WARNING: Experimental compiler backend enabled. Here be dragons! Incorrect rendering, GPU hangs and/or resets are likely
[INFO] [Vulkan]: Found vulkan context: x
[INFO] [Vulkan]: Detecting screen resolution 1920x1080.
[INFO] [GLX]: Window manager is GNOME Shell.
[INFO] [XINERAMA]: Xinerama version: 1.1.
[INFO] [XINERAMA]: Xinerama screens: 1.
[INFO] [GLX]: Using Xinerama on screen #0.
[INFO] [GLX]: X = 0, Y = 0, W = 1920, H = 1080.
[INFO] [GLX]: Requesting compositor bypass.
[INFO] [GLX]: Using windowed fullscreen.
[INFO] [Vulkan]: Found GPU at index 0: AMD RADV/ACO POLARIS10 (LLVM 10.0.0)
[INFO] [Vulkan]: Using GPU index 0.
[INFO] [Vulkan]: Using fences for WSI acquire.
[INFO] [Vulkan]: Using GPU: AMD RADV/ACO POLARIS10 (LLVM 10.0.0)
[INFO] [Vulkan]: Queue family 0 supports 1 sub-queues.
[INFO] [Vulkan]: Swapchain supports present mode: 0.
[INFO] [Vulkan]: Swapchain supports present mode: 1.
[INFO] [Vulkan]: Swapchain supports present mode: 2.
[INFO] [Vulkan]: Creating swapchain with present mode: 2
[INFO] [Vulkan]: Using swapchain size 1920 x 1080.
[INFO] [Vulkan]: Got 3 swapchain images.
[INFO] [Vulkan]: Using resolution 1920x1080
[INFO] [Vulkan]: Using RGB565 format.
[INFO] [Vulkan]: GPU supports linear images as textures, but not DEVICE_LOCAL. Falling back to copy path.
[INFO] [Vulkan]: GPU supports linear images as textures, but not DEVICE_LOCAL. Falling back to copy path.
[INFO] [Vulkan]: GPU supports linear images as textures, but not DEVICE_LOCAL. Falling back to copy path.
[INFO] [Vulkan]: Loading stock shader.
[INFO] [slang]: Building pass #0 (N/A)
[INFO] [Vulkan filter chain]: Not using frame history.
[INFO] [Vulkan filter chain]: Not using framebuffer feedback.
[INFO] [Joypad]: Found joypad driver: "udev".
[INFO] [Font]: Using font rendering backend: freetype.
[INFO] [X11]: Suspending screensaver (X11, xdg-screensaver).
[INFO] [Video]: Found display server: x11
[INFO] [Shaders]: Found shader "/home/aaron/.config/retroarch/shaders/ntsc-streaming.slangp"
[INFO] [Shaders]: Found shader "/home/aaron/.config/retroarch/shaders/retroarch.slangp"
[INFO] [PulseAudio]: Requested 24576 bytes buffer, got 18432.
[INFO] [Display]: Found display driver: "vulkan".
[INFO] [Font]: Using font rendering backend: freetype.
[INFO] [Font]: Using font rendering backend: freetype.
[INFO] [Font]: Using font rendering backend: freetype.
[INFO] [Display]: Found display driver: "vulkan".
[INFO] [Font]: Using font rendering backend: freetype.
[INFO] [Font]: Using font rendering backend: freetype.
[INFO] [Font]: Using font rendering backend: freetype.
[INFO] [Font]: Using font rendering backend: freetype.
[INFO] [Font]: Using font rendering backend: freetype.
[INFO] [Font]: Using font rendering backend: freetype.
Protocol error: bad 3 (Window); Sequence Number 11
 Opcode (20, 0) = GetProperty
 Bad resource 0 (0x0)
 at -e line 16.
[INFO] [LED]: LED driver = 'null' 0x5625038010a0
[INFO] [MIDI]: Input disabled.
[INFO] [MIDI]: Output disabled.
[INFO] [MIDI]: Initialized "alsa" driver.
[WARN] Input device ID 5 is unknown to this libretro implementation. Using RETRO_DEVICE_JOYPAD.
[INFO] [SRAM]: SRAM will not be saved.
[INFO] [Playlist]: Loading history file: [/home/aaron/.config/retroarch/content_history.lpl].
[INFO] [Playlist]: Loading history file: [/home/aaron/.config/retroarch/content_music_history.lpl].
[INFO] [Playlist]: Loading history file: [/home/aaron/.config/retroarch/content_video_history.lpl].
[INFO] [Playlist]: Loading history file: [/home/aaron/.config/retroarch/content_image_history.lpl].
[INFO] [Playlist]: Loading favorites file: [/home/aaron/.config/retroarch/content_favorites.lpl].
[INFO] [Vulkan]: VSync => on
[INFO] [Vulkan]: GPU supports linear images as textures, but not DEVICE_LOCAL. Falling back to copy path.
[INFO] [Vulkan]: GPU supports linear images as textures, but not DEVICE_LOCAL. Falling back to copy path.
[INFO] [Vulkan]: GPU supports linear images as textures, but not DEVICE_LOCAL. Falling back to copy path.
[INFO] [slang]: Building pass #0 (N/A)
[INFO] [Vulkan filter chain]: Not using frame history.
[INFO] [Vulkan filter chain]: Not using framebuffer feedback.
[INFO] [Vulkan]: VSync => on
[INFO] [PulseAudio]: Pausing.
[INFO] [Vulkan]: Do not need to re-create swapchain.
[INFO] [Vulkan]: GPU supports linear images as textures, but not DEVICE_LOCAL. Falling back to copy path.
[INFO] [Vulkan]: GPU supports linear images as textures, but not DEVICE_LOCAL. Falling back to copy path.
[INFO] [Vulkan]: GPU supports linear images as textures, but not DEVICE_LOCAL. Falling back to copy path.
[INFO] [slang]: Building pass #0 (N/A)
[INFO] [Vulkan filter chain]: Not using frame history.
[INFO] [Vulkan filter chain]: Not using framebuffer feedback.
[INFO] [CORE]: Using content: /data/aaron/roms/nes/Journey to Silius (USA).7z.
[INFO] [CORE]: Arg #0: retroarch
[INFO] [CORE]: Arg #1: /data/aaron/roms/nes/Journey to Silius (USA).7z
[INFO] [CORE]: Arg #2: -s
[INFO] [CORE]: Arg #3: /home/aaron/.config/retroarch/saves
[INFO] [CORE]: Arg #4: -S
[INFO] [CORE]: Arg #5: /home/aaron/.config/retroarch/states
[INFO] [CORE]: Arg #6: -c
[INFO] [CORE]: Arg #7: /home/aaron/.config/retroarch/retroarch.cfg
[INFO] [CORE]: Arg #8: -L
[INFO] [CORE]: Arg #9: /home/aaron/.config/retroarch/cores/mesen_libretro.so
[INFO] Content ran for a total of: 00 hours, 00 minutes, 00 seconds.
[INFO] [CORE]: Unloading core..
[INFO] [CORE]: Unloading core symbols..
[INFO] [XINERAMA]: Xinerama version: 1.1.
[INFO] [XINERAMA]: Xinerama screens: 1.
[INFO] [XINERAMA]: Saved monitor #0.
[INFO] [Video]: Does not have enough samples for monitor refresh rate estimation. Requires to run for at least 4096 frames.
[INFO] RetroArch 1.8.9 (Git ed71d91c77)
[INFO] [Overrides]: Redirecting save file to "/home/aaron/.config/retroarch/saves/Journey to Silius (USA).srm".
[INFO] [Overrides]: Redirecting save state to "/home/aaron/.config/retroarch/states/Journey to Silius (USA).state".
[INFO] === Build =======================================
[INFO] CPU Model Name: AMD Ryzen 7 2700X Eight-Core Processor         
[INFO] Capabilities:  MMX MMXEXT SSE SSE2 SSE3 SSSE3 SSE4 SSE4.2 AES AVX AVX2
[INFO] Built: Jun 28 2020
[INFO] Version: 1.8.9
[INFO] Git: ed71d91c77
[INFO] =================================================
[INFO] [CORE]: Loading dynamic libretro core from: "/home/aaron/.config/retroarch/cores/mesen_libretro.so"
[INFO] [Overrides]: Core-specific overrides found at /home/aaron/.config/retroarch/config/Mesen/Mesen.cfg.
[INFO] [Overrides]: No content-dir-specific overrides found at /home/aaron/.config/retroarch/config/Mesen/nes.cfg.
[INFO] [Overrides]: No game-specific overrides found at /home/aaron/.config/retroarch/config/Mesen/Journey to Silius (USA).cfg.
[INFO] Config: appending config "/home/aaron/.config/retroarch/config/Mesen/Mesen.cfg"
WARNING: Experimental compiler backend enabled. Here be dragons! Incorrect rendering, GPU hangs and/or resets are likely
Protocol error: bad 3 (Window); Sequence Number 11
 Opcode (20, 0) = GetProperty
 Bad resource 0 (0x0)
 at -e line 16.
AaronBPaden commented 4 years ago

Still an issue in dc01bf8d467f7bb4a9d08455d2723ae5a9ac707f, though now it seems to consistently crash the driver, so I can no longer easily recover by alt-tabbing into a terminal killing Retroarch. I also updated to Mesa 20.1.3. I've discovered some additional info:

It doesn't occur in every slang shader. bilinear, nearest and ntsc seemed to work fine. Definitely crashes with crt-royale and console-boarder/dmg. Possibly the common factor is loading external images as resources?

Also, this is definitely causing issues in the driver. Later, I'll file a bug with mesa and link to it here. I get this in the system logs:

Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: IH ring buffer overflow (0x00085AA0, 0x00003800, 0x00005AB0)
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: GPU fault detected: 147 0x026ac402 for process retroarch pid 31626 thread retroarch pid 31626
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00118901
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03004002
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: VM fault (0x02, vmid 1, pasid 32769) at page 1149185, write from 'TC1' (0x54433100) (4)
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: GPU fault detected: 147 0x08c2c402 for process retroarch pid 31626 thread retroarch pid 31626
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0011BFA5
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03008002
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: VM fault (0x02, vmid 1, pasid 32769) at page 1163173, write from 'TC0' (0x54433000) (8)
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: GPU fault detected: 147 0x0d120402 for process retroarch pid 31626 thread retroarch pid 31626
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0011BFD5
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03004002
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: VM fault (0x02, vmid 1, pasid 32769) at page 1163221, write from 'TC1' (0x54433100) (4)
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: GPU fault detected: 147 0x06f28402 for process retroarch pid 31626 thread retroarch pid 31626
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0011BF66
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03084002
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: VM fault (0x02, vmid 1, pasid 32769) at page 1163110, write from 'TC7' (0x54433700) (132)
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: GPU fault detected: 147 0x06628402 for process retroarch pid 31626 thread retroarch pid 31626
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0011BFCE
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x030C4002
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: VM fault (0x02, vmid 1, pasid 32769) at page 1163214, write from 'TC3' (0x54433300) (196)
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: GPU fault detected: 147 0x04128402 for process retroarch pid 31626 thread retroarch pid 31626
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0011C004
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03044002
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: VM fault (0x02, vmid 1, pasid 32769) at page 1163268, write from 'TC5' (0x54433500) (68)
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: GPU fault detected: 147 0x06328402 for process retroarch pid 31626 thread retroarch pid 31626
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0011C051
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03008002
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: VM fault (0x02, vmid 1, pasid 32769) at page 1163345, write from 'TC0' (0x54433000) (8)
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: GPU fault detected: 147 0x06ca8402 for process retroarch pid 31626 thread retroarch pid 31626
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0011BF8D
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03088002
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: VM fault (0x02, vmid 1, pasid 32769) at page 1163149, write from 'TC6' (0x54433600) (136)
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: GPU fault detected: 147 0x0ee20802 for process retroarch pid 31626 thread retroarch pid 31626
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0011BFCD
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03088002
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: VM fault (0x02, vmid 1, pasid 32769) at page 1163213, write from 'TC6' (0x54433600) (136)
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: GPU fault detected: 147 0x05d20402 for process retroarch pid 31626 thread retroarch pid 31626
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0011BF61
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03084002
Jul 09 15:37:56 localhost kernel: amdgpu 0000:27:00.0: VM fault (0x02, vmid 1, pasid 32769) at page 1163105, write from 'TC7' (0x54433700) (132)
Jul 09 15:38:06 localhost kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Jul 09 15:38:06 localhost kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Jul 09 15:38:41 localhost kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:45:plane-5] flip_done timed out
Jul 09 15:38:41 localhost kernel: ------------[ cut here ]------------
Jul 09 15:38:41 localhost kernel: WARNING: CPU: 8 PID: 1416 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6760 amdgpu_dm_atomic_commit_tail+0x218e/0x2310 [amdgpu]
Jul 09 15:38:41 localhost kernel: Modules linked in: snd_hrtimer ccm algif_aead cbc des_generic libdes ecb arc4 libarc4 algif_skcipher cmac md4 algif_hash af_alg xt_owner iptable_filter uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd_usb_audio videodev snd_usbmidi_lib xpad snd_rawmidi ff_memless mc joydev mousedev input_leds amdgpu r8188eu(C) edac_mce_amd kvm_amd essiv lib80211 authenc snd_hda_codec_realtek hid_generic kvm snd_hda_codec_generic dm_crypt cfg80211 wmi_bmof ppdev irqbypass ledtrig_audio gpu_sched i2c_algo_bit snd_hda_codec_hdmi ttm usbhid snd_hda_intel snd_intel_dspcfg hid rfkill drm_kms_helper dm_mod crct10dif_pclmul snd_hda_codec crc32_pclmul r8169 cec ghash_clmulni_intel aesni_intel rc_core snd_hda_core ccp syscopyarea crypto_simd realtek sysfillrect sp5100_tco snd_hwdep cryptd sysimgblt glue_helper libphy pcspkr k10temp i2c_piix4 snd_pcm fb_sys_fops rng_core wmi parport_pc parport evdev pinctrl_amd gpio_amdpt mac_hid acpi_cpufreq vboxnetflt(OE) vboxnetadp(OE)
Jul 09 15:38:41 localhost kernel:  vboxdrv(OE) pkcs8_key_parser snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_timer snd soundcore cuse fuse sg drm vhba(OE) crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 crc32c_intel xhci_pci xhci_hcd
Jul 09 15:38:41 localhost kernel: CPU: 8 PID: 1416 Comm: Xorg Tainted: G         C OE     5.7.7-arch1-1 #1
Jul 09 15:38:41 localhost kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.G0 11/11/2019
Jul 09 15:38:41 localhost kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x218e/0x2310 [amdgpu]
Jul 09 15:38:41 localhost kernel: Code: ff ff 41 8b 4c 24 60 48 c7 c2 60 26 08 c1 bf 02 00 00 00 48 c7 c6 58 81 0f c1 e8 0d 5e 39 ff 49 8b 4f 08 e9 3c e0 ff ff 0f 0b <0f> 0b e9 c1 ef ff ff 0f 0b e9 da ef ff ff 48 8b 85 f8 fc ff ff 48
Jul 09 15:38:41 localhost kernel: RSP: 0018:ffff999dc0ddb8a8 EFLAGS: 00010002
Jul 09 15:38:41 localhost kernel: RAX: 0000000000000286 RBX: 000000000063458b RCX: 0000000000000000
Jul 09 15:38:41 localhost kernel: RDX: 0000000000000002 RSI: 0000000000000206 RDI: 0000000000000000
Jul 09 15:38:41 localhost kernel: RBP: ffff999dc0ddbc10 R08: 0000000000000005 R09: ffff999dc0ddb814
Jul 09 15:38:41 localhost kernel: R10: ffff999dc0ddb818 R11: ffff96345b0f6b80 R12: 0000000000000286
Jul 09 15:38:41 localhost kernel: R13: ffff9634788b8800 R14: ffff9633164e6800 R15: ffff96345b0f6b80
Jul 09 15:38:41 localhost kernel: FS:  00007fe5a5fcc280(0000) GS:ffff96347ea00000(0000) knlGS:0000000000000000
Jul 09 15:38:41 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 09 15:38:41 localhost kernel: CR2: 000055b27a226988 CR3: 00000007d5bce000 CR4: 00000000003406e0
Jul 09 15:38:41 localhost kernel: Call Trace:
Jul 09 15:38:41 localhost kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Jul 09 15:38:41 localhost kernel:  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
Jul 09 15:38:41 localhost kernel:  drm_atomic_helper_set_config+0x70/0xb0 [drm_kms_helper]
Jul 09 15:38:41 localhost kernel:  drm_mode_setcrtc+0x220/0x720 [drm]
Jul 09 15:38:41 localhost kernel:  ? drm_mode_getcrtc+0x180/0x180 [drm]
Jul 09 15:38:41 localhost kernel:  drm_ioctl_kernel+0xb2/0x100 [drm]
Jul 09 15:38:41 localhost kernel:  drm_ioctl+0x208/0x360 [drm]
Jul 09 15:38:41 localhost kernel:  ? drm_mode_getcrtc+0x180/0x180 [drm]
Jul 09 15:38:41 localhost kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Jul 09 15:38:41 localhost kernel:  ksys_ioctl+0x82/0xc0
Jul 09 15:38:41 localhost kernel:  __x64_sys_ioctl+0x16/0x20
Jul 09 15:38:41 localhost kernel:  do_syscall_64+0x49/0x90
Jul 09 15:38:41 localhost kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 09 15:38:41 localhost kernel: RIP: 0033:0x7fe5a6c118eb
Jul 09 15:38:41 localhost kernel: Code: 0f 1e fa 48 8b 05 a5 95 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 95 0c 00 f7 d8 64 89 01 48
Jul 09 15:38:41 localhost kernel: RSP: 002b:00007fffa2702dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jul 09 15:38:41 localhost kernel: RAX: ffffffffffffffda RBX: 00007fffa2702e10 RCX: 00007fe5a6c118eb
Jul 09 15:38:41 localhost kernel: RDX: 00007fffa2702e10 RSI: 00000000c06864a2 RDI: 000000000000000b
Jul 09 15:38:41 localhost kernel: RBP: 00000000c06864a2 R08: 0000000000000000 R09: 000055b27a52a1c0
Jul 09 15:38:41 localhost kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Jul 09 15:38:41 localhost kernel: R13: 000000000000000b R14: 000055b27962f350 R15: 0000000000000000
Jul 09 15:38:41 localhost kernel: ---[ end trace acbc99e467626680 ]---
Jul 09 15:38:41 localhost kernel: ------------[ cut here ]------------
Jul 09 15:38:41 localhost kernel: WARNING: CPU: 8 PID: 1416 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6362 amdgpu_dm_atomic_commit_tail+0x2195/0x2310 [amdgpu]
Jul 09 15:38:41 localhost kernel: Modules linked in: snd_hrtimer ccm algif_aead cbc des_generic libdes ecb arc4 libarc4 algif_skcipher cmac md4 algif_hash af_alg xt_owner iptable_filter uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd_usb_audio videodev snd_usbmidi_lib xpad snd_rawmidi ff_memless mc joydev mousedev input_leds amdgpu r8188eu(C) edac_mce_amd kvm_amd essiv lib80211 authenc snd_hda_codec_realtek hid_generic kvm snd_hda_codec_generic dm_crypt cfg80211 wmi_bmof ppdev irqbypass ledtrig_audio gpu_sched i2c_algo_bit snd_hda_codec_hdmi ttm usbhid snd_hda_intel snd_intel_dspcfg hid rfkill drm_kms_helper dm_mod crct10dif_pclmul snd_hda_codec crc32_pclmul r8169 cec ghash_clmulni_intel aesni_intel rc_core snd_hda_core ccp syscopyarea crypto_simd realtek sysfillrect sp5100_tco snd_hwdep cryptd sysimgblt glue_helper libphy pcspkr k10temp i2c_piix4 snd_pcm fb_sys_fops rng_core wmi parport_pc parport evdev pinctrl_amd gpio_amdpt mac_hid acpi_cpufreq vboxnetflt(OE) vboxnetadp(OE)
Jul 09 15:38:41 localhost kernel:  vboxdrv(OE) pkcs8_key_parser snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_timer snd soundcore cuse fuse sg drm vhba(OE) crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 crc32c_intel xhci_pci xhci_hcd
Jul 09 15:38:41 localhost kernel: CPU: 8 PID: 1416 Comm: Xorg Tainted: G        WC OE     5.7.7-arch1-1 #1
Jul 09 15:38:41 localhost kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.G0 11/11/2019
Jul 09 15:38:41 localhost kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2195/0x2310 [amdgpu]
Jul 09 15:38:41 localhost kernel: Code: 48 c7 c2 60 26 08 c1 bf 02 00 00 00 48 c7 c6 58 81 0f c1 e8 0d 5e 39 ff 49 8b 4f 08 e9 3c e0 ff ff 0f 0b 0f 0b e9 c1 ef ff ff <0f> 0b e9 da ef ff ff 48 8b 85 f8 fc ff ff 48 8d 8d 64 fd ff ff 48
Jul 09 15:38:41 localhost kernel: RSP: 0018:ffff999dc0ddb8a8 EFLAGS: 00010086
Jul 09 15:38:41 localhost kernel: RAX: 0000000000000001 RBX: 000000000063458b RCX: 0000000000000000
Jul 09 15:38:41 localhost kernel: RDX: 0000000000000002 RSI: 0000000000000206 RDI: 0000000000000000
Jul 09 15:38:41 localhost kernel: RBP: ffff999dc0ddbc10 R08: 0000000000000005 R09: ffff999dc0ddb814
Jul 09 15:38:41 localhost kernel: R10: ffff999dc0ddb818 R11: ffff96345b0f6b80 R12: 0000000000000286
Jul 09 15:38:41 localhost kernel: R13: ffff9634788b8800 R14: ffff9633164e6800 R15: ffff96345b0f6b80
Jul 09 15:38:41 localhost kernel: FS:  00007fe5a5fcc280(0000) GS:ffff96347ea00000(0000) knlGS:0000000000000000
Jul 09 15:38:41 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 09 15:38:41 localhost kernel: CR2: 000055b27a226988 CR3: 00000007d5bce000 CR4: 00000000003406e0
Jul 09 15:38:41 localhost kernel: Call Trace:
Jul 09 15:38:41 localhost kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Jul 09 15:38:41 localhost kernel:  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
Jul 09 15:38:41 localhost kernel:  drm_atomic_helper_set_config+0x70/0xb0 [drm_kms_helper]
Jul 09 15:38:41 localhost kernel:  drm_mode_setcrtc+0x220/0x720 [drm]
Jul 09 15:38:41 localhost kernel:  ? drm_mode_getcrtc+0x180/0x180 [drm]
Jul 09 15:38:41 localhost kernel:  drm_ioctl_kernel+0xb2/0x100 [drm]
Jul 09 15:38:41 localhost kernel:  drm_ioctl+0x208/0x360 [drm]
Jul 09 15:38:41 localhost kernel:  ? drm_mode_getcrtc+0x180/0x180 [drm]
Jul 09 15:38:41 localhost kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Jul 09 15:38:41 localhost kernel:  ksys_ioctl+0x82/0xc0
Jul 09 15:38:41 localhost kernel:  __x64_sys_ioctl+0x16/0x20
Jul 09 15:38:41 localhost kernel:  do_syscall_64+0x49/0x90
Jul 09 15:38:41 localhost kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 09 15:38:41 localhost kernel: RIP: 0033:0x7fe5a6c118eb
Jul 09 15:38:41 localhost kernel: Code: 0f 1e fa 48 8b 05 a5 95 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 95 0c 00 f7 d8 64 89 01 48
Jul 09 15:38:41 localhost kernel: RSP: 002b:00007fffa2702dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jul 09 15:38:41 localhost kernel: RAX: ffffffffffffffda RBX: 00007fffa2702e10 RCX: 00007fe5a6c118eb
Jul 09 15:38:41 localhost kernel: RDX: 00007fffa2702e10 RSI: 00000000c06864a2 RDI: 000000000000000b
Jul 09 15:38:41 localhost kernel: RBP: 00000000c06864a2 R08: 0000000000000000 R09: 000055b27a52a1c0
Jul 09 15:38:41 localhost kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Jul 09 15:38:41 localhost kernel: R13: 000000000000000b R14: 000055b27962f350 R15: 0000000000000000
Jul 09 15:38:41 localhost kernel: ---[ end trace acbc99e467626681 ]---
Jul 09 15:38:51 localhost kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Jul 09 15:38:51 localhost kernel: rfkill: input handler enabled
Jul 09 15:39:01 localhost kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
inactive123 commented 4 years ago

I guess the main central issue here is some of the direct initialization of variables going wrong.

Can you do a git checkout on this commit -

https://github.com/libretro/RetroArch/commit/ed71d91c775e0f0f812def909ab222f53a188fd1

and then keep reverting the changes being made to each function until you find the one that makes everything work again? If you do, then try to narrow down the actual case even more by seeing which exact changes to a function (or functions) were responsible here. We need to know which struct here is responsible for the issues you're seeing, and what exactly we are directly initializing which is wrong.

AaronBPaden commented 4 years ago

Oh, yeah, good idea. It was actually easier to do it in reverse, start from the last good commit and add the changes back function by function.

Looks like the broken functions in 72d1a313aee0a30bd16b37a2dcf6434f2bf3b8f5 are vulkan_image_layout_transition_levels (uses up all memory) and vulkan_framebuffer_generate_mips (crashes driver). vulkan_image_layout_transition_levels is no longer an issue in the latest commit, but vulkan_framebuffer_generate_mips still is.

I don't know yet which part of it is broken, but if I replace the vulkan_framebuffer_generate_mips implementation in a063133a96c0ba3c0ec90ea1ac346aecc524e597 with the implementation in 04fb139bcb66c585e91179653fb54b3632e4971a, retroarch works again.

inactive123 commented 4 years ago

OK, I reverted the function -

https://github.com/libretro/RetroArch/commit/39d3dd4b3ca37765711aed5c65c3a2f8f38319e3

Can you let me know what remains broken now or if everything is fixed now?

AaronBPaden commented 4 years ago

Yep, that did it! Everything appears to be in order again, so I'll close this issue.

Thanks @twinaphex!