o3de / o3de

Open 3D Engine (O3DE) is an Apache 2.0-licensed multi-platform 3D engine that enables developers and content creators to build AAA games, cinema-quality 3D worlds, and high-fidelity simulations without any fees or commercial obligations.
https://o3de.org
Other
7.67k stars 2.19k forks source link

Linux: GPU crash upon loading level using Intel integrated graphics #15999

Open nicholas-rh opened 1 year ago

nicholas-rh commented 1 year ago

Describe the bug When attempting to run the Editor program and open a level while running on Fedora 38 after building the engine & project from source, the Editor will lock up and stop responding. From my limited testing, the problem happens when using Intel integrated graphics using the Mesa libraries & i915 kernel module, although when using Mesa and a discrete GPU (AMD RX480 with the amdgpu driver) on a different PC I did not encounter the problem and could load the level successfully.

Steps to reproduce 1.) Install Fedora (or potentially another Linux distro?) 2.) Ensure that integrated graphics are being used 3.) Download the development branch, build the engine & a project 4.) Open the Editor and launch the project 5.) Create a level or open an existing one 6.) The GPU will crash and the editor will freeze up

Expected behavior The level loads successfully.

Actual behavior The editor stops responding.

Screenshot image

Found in Branch development

Commit ID from o3de/o3de Repository https://github.com/o3de/o3de/commit/ad084e683febb5395d6de2893816b22b66953d87

Desktop/Device (please complete the following information):

Additional context

Kernel logs after crash:

May 17 09:07:34 fedora gnome-shell[2943]: Window 0x2e00042 cannot be minimized, but something tried anyways. Not having it!
May 17 09:07:42 fedora kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
May 17 09:07:42 fedora kernel: i915 0000:00:02.0: [drm] Editor[11620] context reset due to GPU hang
May 17 09:07:43 fedora kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in Editor [11620]

vulkaninfo command output: vulkaninfo.txt

Typical console log after a crash with the following debug flags passed in: Editor.log

    -rhi-device-validation="enable"
    -rhi-device-validation="verbose"
    -rhi-device-validation="gpu"

CPU stack trace with the AZ_FORCE_CPU_GPU_INSYNC macro enabled (unfortunately no useful Vulkan data on frame 10): bt3.txt

Related issues? https://github.com/o3de/o3de/issues/15947 https://github.com/o3de/o3de/issues/15947 https://github.com/o3de/o3de/issues/11849

nicholas-rh commented 1 year ago

Also tested and it seems to crash using either X11 or Wayland so display server does not seem to make a difference

moudgils commented 1 year ago

Hey @akioCL. When you have time please take a look at this issue and analyze if it is something deeper or trivial.

nicholas-rh commented 1 year ago

Also forgot to mention, but this PR did not seem to fix the issue https://github.com/o3de/o3de/pull/15832/

nicholas-rh commented 1 year ago

It would also be good information to see if anyone can replicate this using Intel integrated graphics & some flavor of Linux/Mesa/i915 driver, to rule out it being something specific to my config. I tried to replicate it on some older machines I have laying around but ran into some unrelated issues with Mesa & Vulkan support being incomplete for the older Intel hardware, and don't have any newer hardware to test on. I would imagine testing the snap package would be enough.

12345XuXin54321 commented 1 year ago

There is a same crash in my PC.

Found in Branch

development

Commit ID from o3de/o3de Repository

o3de/o3de@0eaa71988c9d69d053cecbb2a0118547c7cbf2d7

Desktop/Device (please complete the following information):

Additional context

Kernel logs after crash:

7月 25 09:00:24 gentoo-latitude kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[2112]:546e timed out (hint:intel_atomic_commit_ready [i915])
7月 25 09:00:29 gentoo-latitude kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in Editor [6490]
7月 25 09:00:29 gentoo-latitude kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
7月 25 09:00:29 gentoo-latitude kernel: i915 0000:00:02.0: [drm] Editor[6490] context reset due to GPU hang