pop-os / pop

A project for managing all Pop!_OS sources
https://system76.com/pop
2.48k stars 87 forks source link

FAILED TO WRITE REG 28b4 WAIT REG 28c6 #782

Open FetidDischarge opened 4 years ago

FetidDischarge commented 4 years ago

Distribution (run cat /etc/os-release): NAME="Pop!_OS" VERSION="19.10" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Pop!_OS 19.10" VERSION_ID="19.10" HOME_URL="https://system76.com/pop" SUPPORT_URL="http://support.system76.com" BUG_REPORT_URL="https://github.com/pop-os/pop/issues" PRIVACY_POLICY_URL="https://system76.com/privacy" VERSION_CODENAME=eoan UBUNTU_CODENAME=eoan LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME):

Issue/Bug Description: Takes a while to reach the login screen, and then just shows a black screen after entering password. Keyboard does nothing. Can persist over several reboots and then finally allows a login. If I log-in via TTY (ctrl-alt-f3) I can explore the machine via the terminal, but 'startx' just spits me back to the commandline.

If I go to the TTY while I'm waiting for the login screen to load completely it shows these errors:

Failed to write reg 28b4 wait reg 28c6 Failed to write reg 1a6f4 wait reg 1a706

The TTY 'pop-os login' line comes up after them, and I can log in via the terminal.

Steps to reproduce (if you know): Boot machine.

Expected behavior: Reach desktop after log-in.

Other Notes: I'm on an Acer Aspire 5 with Ryzen 7 3700U and mobile Vega gpu. I installed the Intel + AMD Pop! version. I'm dualbooted with Windows 10.

Velaseriat commented 4 years ago

I can confirm having same issue.

FetidDischarge commented 4 years ago

It seems this is a wider Linux issue with AMD GPUs. Does anyone have a workaround?

oneEyedCharlie commented 4 years ago

I'm getting the same error message on my IdeaPad (AMD). Although it is less severe a problem for me, as it only happens upon power off or reboot. I wish that I could understand what this error message even means.

davidbanhos commented 4 years ago

On my E595 AMD U3500 the issue is intermittent. It manifests itself must of the times whenever the laptop starts with battery power only. I've never have this issue if the computer starts using the power cord.

Does anyone have the same related symptom?

I found the post https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1149783-a-possible-workaround-for-amd-apus-with-stability-issues-on-recent-kernels . I'll give a try later today on the latest 5.3 kernel.

oneEyedCharlie commented 4 years ago

Thank you for the info, David. I researched your above solution, and it did not work for me.

For anyone else who wants to try it, it involves putting a amdgpu.noretry=0 in your kernel parameters (can be done by reconfiguring grub).

davidbanhos commented 4 years ago

Hi @oneEyedCharlie , when the issue most happen in your machine? Battery powered?

oneEyedCharlie commented 4 years ago

Yes, on battery. That's the way I use the laptop about 90% of the time. I'm not sure whether or not I've seen it happen when plugged in, but I'll try to look for that in the future.

It happens on shutdown mostly. Something is also happening sometimes on boot (maybe the same thing?). I use a text boot with grub, and about halfway through all the OK statuses, it just blanks and becomes completely unresponsive. I have to then power off manually. That seems to happen maybe 1 out of 5 boots.

davidbanhos commented 4 years ago

Thanks @oneEyedCharlie for the review. I'm testing with kernel 5.5, let you all know how it goes.

nirandaperera commented 4 years ago

Hi, I am having the same issue in ubuntu 18.04 in my Thinkpad T495 (5.3.0-26-generic #28~18.04.1-Ubuntu). Looking forward to @davidbanhos's feedback! :-)

Klaas57 commented 4 years ago

I have the same issue with my Thinkpad T495 on Linux Mint 19.3. Nice to see that im not the only one

sgrust01 commented 4 years ago

Similar issue with T495... I can confirm the issue persisted across multiple restart (on battery), but cleared with power cable plugged on... kernel 5.3 Cheers

ibirrer commented 4 years ago

I have a similar issue too on T495 with archlinux. Kernel 5.4. Difference is that it only seems to happen if if the power is plugged in. Screen goes blank after starting display manager (sddm). Logs are exactly as in original post.

failed to write reg 28b4 wait reg 28c6
failed to write reg 1a6f4 wait reg 1a706
davidbanhos commented 4 years ago

I tested kernel 5.5 (Linux pop-os 5.5.0-050500-generic), unfortunately the issue persists.

mmatuska commented 4 years ago

I don't have this issue on recent 5.5 kernels anymore.

oneEyedCharlie commented 4 years ago

@mmatuska, Thanks for the heads-up. Looks like Ubuntu will get 5.5 in April, and since my manifestation of this problem is minor compared to others, I'll simply wait until then.

mmatuska commented 4 years ago

@oneEyedCharlie I am not sure if Ubuntu 20.04 goes 5.5 because of 5.4's LTS attractiveness. I am currently running 5.5 from mainline ppa. I need both virtualbox 6.1 and zfsonlinux 0.8.3 (zfs-dkms package) and these don't work with upcoming 5.6 yet.

nirandaperera commented 4 years ago

Is there an official ubuntu bug reported for this case?

davidbanhos commented 4 years ago

I'm not getting the issue anymore after 5.5.6 kernel. Hope I'm not anticipating myself on the case.

uname -a

pop-os 5.5.6-050506-generic #202002240832 SMP Mon Feb 24 08:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I'm further testing it.

mmatuska commented 4 years ago

@davidbanhos I have written a blog post that adresses more issues than this one: https://blog.vx.sk/archives/287

oneEyedCharlie commented 4 years ago

@mmatuska I read your post. Just want to point out that the amount of RAM consumed by the integrated graphics is definable, although I don't know whether that is the case with your particular ASUS notebook. My Lenovo notebook has a setting for it in its most recent BIOS version. I don't game on mine, so I set it down from 2GB to 512MB (it's able to be sett all the way down to 256MB), so now my OS has 7.5 GB available.

mmatuska commented 4 years ago

@oneEyedCharlie thank you for the information, I have updated the post. I have opened a service request to ASUS some time ago and their support answered that the allocation is not adjustable on ASUS UM431DA series. Did you encounter any of the issues I write in the blog post?

oneEyedCharlie commented 4 years ago

On my Lenovo IdeaPad Flex 14 81SS0002US with Ubuntu 19.10, I do not get the other problems you list. Just the same "Display initialisation and suspend bugs".

oneEyedCharlie commented 4 years ago

Thank you @davidbanhos. I want to confirm that my system is like yours in that it only happens when I boot from battery. So that is very interesting.

Second, installing kernel 5.5.7-generic on Ubuntu did NOT fix it. I wonder if there is something special about the Pop-OS kernel.

mmatuska commented 4 years ago

On my ASUS UM431DA the 5.5 kernel does solve the problem. The behavior on battery and on AC power is the same.

davidbanhos commented 4 years ago

@oneEyedCharlie , Lenovo E585 Linux pop-os 5.5.6-050506-generic. Also running Wayland.

yangxuanx commented 4 years ago

I encountered the same problem

If I use the battery, the same error message will appear after restarting the computer. After restarting, it is very slow to enter gnome, but only when the chrome browser is opened, the entire computer will freeze.Can only be forcibly turned off, and turned on again after plugging in the power

Lenovo XiaoXinPro-13API 2019 AMD Ryzen 5 3550H

Linux pop-os 5.3.0-7629-generic #31~1581628825~19.10~f90b7d5-Ubuntu SMP Fri Feb 14 19:56:45 UTC x86_64 x86_64 x86_64 GNU/Linux

FetidDischarge commented 4 years ago

Until Pop gets on to kernel 5.5 this will keep happening. I switched to manjaro a few weeks ago and have had zero problems of any kind.On 8 Mar 2020 05:53, &0xFF notifications@github.com wrote:I encountered the same problem Lenovo XiaoXinPro-13API 2019 AMD Ryzen 5 3550H Linux pop-os 5.3.0-7629-generic #31158162882519.10~f90b7d5-Ubuntu SMP Fri Feb 14 19:56:45 UTC x86_64 x86_64 x86_64 GNU/Linux

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.

oneEyedCharlie commented 4 years ago

I spent several hours with 5.5.7 and 5.5.8 Ubuntu kernels, and it simply didn't help. Anyway I gave up. I do have my workaround of plugging in my power bank, just on booting.

--------edit------- I do wonder if there is a way to fool Linux into thinking it's NOT on battery power, just long enough to boot, and if so, if that would then fix it.

yangxuanx commented 4 years ago

I used 5.5.8 kernel and the problem persists.

Linux pop-os 5.5.8-050508-generic #202003051633 SMP Thu Mar 5 16:37:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Also I found another one reporting the same issue https://github.com/pop-os/pop/issues/725

davidbanhos commented 4 years ago

Hi all, I have "initial" bad news for the kernel Linux pop-os 5.5.7-050507-generic but I'm having the issue again.

I maybe missing something...

I've followed https://blog.vx.sk/archives/287 (thanks @mmatuska). I installed the related linux-firmware from Ubuntu 20.04.

But after restarting the issue's persisted.

Then I executed another step, updating amd64-microcode (https://launchpad.net/ubuntu/focal/amd64/amd64-microcode/3.20191218.1ubuntu1). I got this idea from some Archlinux comments. That is a similar package into this distro, amd-ucode (https://www.reddit.com/r/archlinux/comments/er7n0a/problems_with_kernel_54_on_ryzen_3200u/ff2i6b9/)

Until now, that is working correctly - I've cycled the my machine a couple. I'll keep checking.

Please be aware, I'm not senior adm or dev on Linux, I'm just an user.

bboennemann commented 4 years ago

I am seeing the same issue in my new Ideapas s340 w. Ryzen 5. However, I am running a Linux Mint:

uname -a
Linux box 5.3.0-40-generic #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Since everyone pointed out that it doesn't make a difference, I did not updated to Kernel v5.5. I installed the updated version of amd64-microcode without any result.

As others have reported before me, the entire system startup slows down when power is not plugged in. In that case, starting any app will take significantly longer. When I finally start Chrome and I see that it takes forever to display any UI, I know the system will freeze next. When I start with power plugged in, everything works fine.

I am even considering returning the laptop due to that. I may be better off with the i5 version unless there is any support for this on the horizon.

NotLoose commented 4 years ago

Having the same problem on my HP Envy x360 13 (Ryzen 3 3300U with Radeon Vega mobile graphics), running standard Ubuntu 18.04 dual booted with Windows 10.

uname -a
Linux blah 5.3.0-42-generic #34~18.04.1-Ubuntu SMP Fri Feb 28 13:42:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

On battery power the system is usually slow to startup and run, and sometimes freezes while booting. When shutting down I see the FAILED TO WRITE REG .... errors. I also experienced chrome freezing and crashing, but fixed that by disabling hardware acceleration (https://askubuntu.com/questions/1054738/ubuntu-18-04-freezes-while-im-using-chrome). The system is sometimes usable, sometimes not.

oneEyedCharlie commented 4 years ago

I too am having that problem with Chromium. It did not occur to me that it might be related to this "FAILED TO WRITE REG 28b4 WAIT REG 28c6" stuff, but after some tests, it is.

artjoma commented 4 years ago

Same issue:

sizziff commented 4 years ago

I have same issue :( HP pavilion cw1000ur Ryzhen 3 3300u Linux Mint 19.3 Linux note 5.3.0-42-generic #34~18.04.1-Ubuntu SMP Fri Feb 28 13:42:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

also I have this issue: W: Possible missing firmware /lib/firmware/amdgpu/vega20_ta.bin for module amdgpu W: Possible missing firmware /lib/firmware/amdgpu/navi10_mes.bin for module amdgpu

When system start in syslog: Mar 26 20:18:39 note kernel: [ 79.149679] failed to write reg 28b4 wait reg 28c6 Mar 26 20:18:40 note kernel: [ 80.765254] failed to write reg 1a6f4 wait reg 1a706 Mar 26 20:18:42 note kernel: [ 82.385263] failed to write reg 28b4 wait reg 28c6 Mar 26 20:18:44 note kernel: [ 84.001315] failed to write reg 1a6f4 wait reg 1a706

when system shutdown: Mar 26 20:17:30 note kernel: [ 9.621379] failed to write reg 28b4 wait reg 28c6 Mar 26 20:17:31 note kernel: [ 11.229139] failed to write reg 1a6f4 wait reg 1a706 Mar 26 20:17:33 note kernel: [ 12.892814] failed to write reg 28b4 wait reg 28c6 Mar 26 20:17:35 note kernel: [ 14.505373] failed to write reg 1a6f4 wait reg 1a706 Mar 26 20:17:36 note kernel: [ 16.125018] failed to write reg 28b4 wait reg 28c6 Mar 26 20:17:38 note kernel: [ 17.733269] failed to write reg 1a6f4 wait reg 1a706 Mar 26 20:17:42 note kernel: [ 21.285153] failed to write reg 28b4 wait reg 28c6 Mar 26 20:17:43 note kernel: [ 22.897075] failed to write reg 1a6f4 wait reg 1a706 Mar 26 20:17:46 note kernel: [ 25.693436] failed to write reg 28b4 wait reg 28c6 Mar 26 20:17:48 note kernel: [ 27.305001] failed to write reg 1a6f4 wait reg 1a706

oneEyedCharlie commented 4 years ago

Yeah seems we're all having quite similar symptoms.

Other things I've noticed:

  1. While booting on cord power, it boots error-free 100% of the time, but when on battery power, it only produces the errors about 80% of the time. That's strange.
  2. If the pause on boot happens, then the shutdown errors (28b4...) are 100% likely.
  3. My SSD is encrypted. The error cascade condition is being triggered BEFORE I get to my encryption prompt at about 1.5 seconds into the kernel. I know this because if I wait until receiving that prompt, and I then plug in the power, it doesn't help. It still gets the usual 80% likely pause, and if that happens, then the 100% error shutdown.
  4. This gives me a few more ideas. Next I'm going to try plugging in power at my GRUB prompt, and see if that fixes it.
artjoma commented 4 years ago

I fixed with:


sudo add-apt-repository ppa:oibaf/graphics-drivers
sudo apt-get update
sudo apt-get dist-upgrade
oneEyedCharlie commented 4 years ago

I fixed with:

sudo add-apt-repository ppa:oibaf/graphics-drivers
sudo apt-get update
sudo apt-get dist-upgrade

It's weird that that works for some us. Not me. Do you have any special commands in your kernel boot line?

bboennemann commented 4 years ago

Sorry about the ignorant question. Is this a package that can be trusted and how do I know?

artjoma commented 4 years ago

oibaf: https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers. I didn't include any special commands to kernel.

jaimealsilva commented 4 years ago

So it means there is a fix upstream. I'll wait until Pop OS drivers are updated (I thought Pop OS had the latest).

On Thu, Mar 26, 2020 at 3:58 PM ArtyomAminov notifications@github.com wrote:

oibaf: https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers. I didn't include any special commands to kernel.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pop-os/pop/issues/782#issuecomment-604682143, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIX6SJL7ZI65O462ZX72UDRJO6WZANCNFSM4JY2RCRQ .

dac73 commented 4 years ago

It's fixed for me if I add iommu=soft as kernel param, on stock 19.10

But I really don't know what is affected with this param. Edit: So iommu=soft fixes issue with booting on battery, but just oibaf didn't worked for me. Regarding the message FAILED TO WRITE.... I fixed that one with updating firmware: git clone git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git && sudo cp -v -u linux-firmware/amdgpu/* /lib/firmware/amdgpu && sudo update-initramfs -uk all

oneEyedCharlie commented 4 years ago

It's fixed for me if I add iommu=soft as kernel param, on stock 19.10

Amazing. This finally fixed it for me. Does anyone know if this has any drawbacks?

shm0sby commented 4 years ago

It's fixed for me if I add iommu=soft as kernel param, on stock 19.10

But I really don't know what is affected with this param. Edit: So iommu=soft fixes issue with booting on battery, but just oibaf didn't worked for me. Regarding the message FAILED TO WRITE.... I fixed that one with updating firmware: git clone git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git && sudo cp -v -u linux-firmware/amdgpu/* /lib/firmware/amdgpu && sudo update-initramfs -uk all

It seems like updating the firmware also fixed my boot & chrome problems.

yangxuanx commented 4 years ago

I fixed with:

sudo add-apt-repository ppa:oibaf/graphics-drivers
sudo apt-get update
sudo apt-get dist-upgrade

Thank you very much! Seems to fix the problem, but for me it is not 100%

Lenovo XiaoXinPro-13API 2019 CPU: AMD Ryzen 5 3550H with Radeon Vega Mobile Gfx (8) @ 2.100GHz GPU: AMD ATI Picasso

I updated the mainline kernel 5.5.13-050513-generic I updated amd-microcode


2020.3.31 append https://github.com/pop-os/pop/issues/782#issuecomment-605669645 When I added the kernel parameters, I tested several times and found that the problem was solved

sudo kernelstub -a "iommu=soft"
sudo kernelstub -p

kernelstub.Config    : INFO     Looking for configuration...
kernelstub           : INFO     System information:

    OS:..................Pop!_OS 19.10
    Root partition:....../dev/nvme0n1p2
    Root FS UUID:........83bb538e-87ce-4193-8524-c6aa593ee4d9
    ESP Path:............/boot/efi
    ESP Partition:......./dev/nvme0n1p1
    ESP Partition #:.....1
    NVRAM entry #:.......-1
    Boot Variable #:.....0000
    Kernel Boot Options:.systemd.show_status=false splash quiet loglevel=0 iommu=soft
    Kernel Image Path:.../boot/vmlinuz-5.5.13-050513-generic
    Initrd Image Path:.../boot/initrd.img-5.5.13-050513-generic
    Force-overwrite:.....False

kernelstub           : INFO     Configuration details:

   ESP Location:................../boot/efi
   Management Mode:...............True
   Install Loader configuration:..True
   Configuration version:.........3

https://github.com/spotify/linux/blob/master/Documentation/x86/x86_64/boot-options.txt#L229

2020.5.3 append I have changed the operating system Arch Linux The test is normal without any problems

dac73 commented 4 years ago

Here is a weird turn of events. Since I was in the right mood, I did a full reinstall. And now I don't have problems (e.g. don't need iommu or updated firmware). Which could potentionaly indicate that some install triggers "bad" behaviour?

btw my HW is t495 with 3700u

sniff122 commented 4 years ago

I got the Ryzen 3 3300U and have been facing this issue too, i have had the issue on kubuntu consistently, even on a fresh install and sometimes on the live environment, and I discovered that on ubuntu (19.10), installing the "kubuntu-desktop" package triggered the "failed to write reg" messages in dmesg/tty but now since doing a kernel update, i have been facing the issue. I have also been having consistent freezes and generally slow booting and hanging when loading GDM and SDDM (ubuntu and kubuntu respectively). I have a HP Pavilion 15-cw1500sa. I have tried the IOMMU kernel option and the oibaf PPA with no solutions but with the IOMMU kernel option i did notice that it no longer gives me the "failed to write reg" in dmesg/tty when the power supply is plugged in but continues to provide the errors when on battery power.

ghost commented 4 years ago

I've experienced this on elementary OS — failed to write reg, W: Possible missing firmware /lib/firmware/amdgpu/vega20_ta.bin for module amdgpu W: Possible missing firmware /lib/firmware/amdgpu/navi10_mes.bin for module amdgpu, and battery percentage issues. Ryzen 5 3500U laptop, mid-2019

dac73 commented 4 years ago

So this bug is annoying and inconsistent. Today on first boot it happened again, FAILED TO WRITE... in logs and system unresponsive. Then I force turn off the laptop, wait 3.4 seconds, turn back on and everything is fine. :man_shrugging:

edit: day 3, I had to add iommu=soft boot wasn't working anymore edit2: I didn't have that problem on Manjaro, Fedora and OpenSuse Tumbleweed, so this could be fixed in kernel update?

gogodee commented 4 years ago

Same issue on Ryzen 5 3500U with Asus Vivobook 15 X512DA. I've tried everything but the desktop freezes ocassionally and after a few days, the installation destroys itself and shows nothing but black screen. Really regretting the Ryzen choice, should've got the intel variant.