pop-os / linux

Pop!_OS fork of https://launchpad.net/ubuntu/+source/linux
Other
110 stars 13 forks source link

Changes between 6.4.x and 6.5.0 cause hang on boot for Gaze16-3050. #297

Open XV-02 opened 4 months ago

XV-02 commented 4 months ago

While attempting to certify new firmware releases for the Gaze16 series, I was informed by Care team that a bug currently exists which has caused a hang during boot on for some Gaze16-3050 systems as currently released. I was able to recreate this hang reliably in the lab.

This hang occurs on any system with a 6.5.0 or newer kernel, and with with Nvidia drivers released after the 6.5.0 kernel release in early September of 2023. It manifests when booting on battery without AC power. It presents as the system booting to the System76 splash screen and hanging after the text for selecting the bios or the systemd boot menu appears. The system proceeds to stay on that screen, and to never load the kernel sufficiently for logging or other services to be available. Resolving the hang requires powering the system off by holding the power button, and booting with AC power attached.

This hang does not happen when the system has access to AC power. It also does not happen if either the Kernel is pre-6.5.0, or the Nvidia Driver version is older than 525.147.05/ 535.113.01/ 545.23.06. It also does not happen on the current candidate firmware, which is being blocked by unrelated GPU issues.

Between the 6.4.x cycle of the Kernel, and the 6.5.0 release, engineers working on behalf of Intel refactored parts of the kernel's early boot procedure. My guess is that this change is the root cause, combined with power settings in the current firmware release. While the firmware side of the problem is being worked, it is currently at a hard stop, and a Kernel solution may prove more fruitful for the subset of users who are impacted.

The reason I'm treating this as a Kernel issue is, in part, because I also see this issue when using the nouveau driver, and because older Kernels with our latest Nvidia driver releases do not present the issue. I can also suggest a rational kernel change that may be impacting the part of the boot process that appears to be presenting the issue.