raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11k stars 4.95k forks source link

GPU is reset on Raspberry Pi 3B+ #5780

Open pizdjuk opened 9 months ago

pizdjuk commented 9 months ago

Describe the bug

The same bug as https://github.com/raspberrypi/linux/issues/3221. Request for reopening.

Steps to reproduce the behaviour

Occurs occasionally near 1 time at 1-2 days.

Device (s)

Raspberry Pi 3 Mod. B+

System

LibreELEC (official): 11.0.3, Kernel 6.1.38.

Logs

No response

Additional context

No response

rzr commented 7 months ago

I am investigating on similar issue but observed on RPi2

I am able to reproduce while playing with settings of kodi

Some inputs :

dmesg # [  236.318632]
"[drm] Resetting GPU."

~ #  vcdbg log msg
Failed to allocate -385826387 bytes for message buffer

~ # cat /proc/device-tree/model 
Raspberry Pi 2 Model B Rev 1.1~

~ # cat /proc/version 
Linux version 6.1.71 (docker@e800d296be18) (armv7ve-libreelec-linux-gnueabihf-gcc-12.2.0 (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 SMP Thu Jan 11 12:01:59 UTC 2024

~ # vcgencmd version
Oct 17 2023 15:43:26 
Copyright (c) 2012 Broadcom
version 30f0c5e4d076da3ab4f341d88e7d505760b93ad7 (clean) (release) (start_x)

cat /sys/class/drm/card0-HDMI-A-1/modes 
1360x768
1920x1080
1920x1080
1920x1080i
1920x1080i
1920x1080i
1920x1080
1920x1080i
1920x1080i
1920x1080
1920x1080
1280x1024
1280x720
1280x720
1280x720
1024x768
800x600
720x576
720x576
720x576
720x480
720x480
720x480
720x480
720x480
640x480
640x480
640x480
mfraser commented 7 months ago

I'm seeing the same after doing an update this afternoon. xserver-xorg-core:armhf (2:1.20.11-1+rpt3+deb11u10, 2:1.20.11-1+rpt3+deb11u11), xserver-common:armhf (2:1.20.11-1+rpt3+deb11u10, 2:1.20.11-1+rpt3+deb11u11), xwayland:armhf (2:1.20.11-1+rpt3+deb11u10, 2:1.20.11-1+rpt3+deb11u11)

[ 4167.072278] [drm] Resetting GPU.

# cat /proc/version 
Linux version 6.1.21-v7+ (dom@buildbot) (arm-linux-gnueabihf-gcc-8 (Ubuntu/Linaro 8.4.0-3ubuntu1) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1642 SMP Mon Apr  3 17:20:52 BST 2023

vcgencmd version
Mar 17 2023 10:52:42 
Copyright (c) 2012 Broadcom
version 82f3750a65fadae9a38077e3c2e217ad158c8d54 (clean) (release) (start)
XECDesign commented 7 months ago

The changes added in that version: https://release.debian.org/proposed-updates/bullseye_diffs/xorg-server_1.20.11-1+deb11u11.debdiff

KingZambo commented 5 months ago

Same issue here with Raspbian GNU/Linux 12 (bookworm) armv7l on a Pi2B Rev1.1 Kernel-Version: 6.6.20+rpt-rpi-v7

with fresh installed RetroPie as well as with kodi

bramoosterhuis commented 5 months ago

I'm here running into the same issue but then with a image build by buildroot running webkit on a RPI3b. I updated the kernel from 6.1.61 to 6.1.77 due to this issue https://github.com/raspberrypi/linux/issues/5674.

my config:

This is the register dump when the message appears.

# cat /sys/kernel/debug/dri/0/v3d_regs 
   V3D_IDENT0 = 0x02443356
   V3D_IDENT1 = 0xc1102431
   V3D_IDENT2 = 0x00000121
  V3D_SCRATCH = 0x00000000
  V3D_L2CACTL = 0x00000001
  V3D_SLCACTL = 0x00000000
   V3D_INTCTL = 0x00000000
   V3D_INTENA = 0x00000007
   V3D_INTDIS = 0x00000007
    V3D_CT0CS = 0x00004020
    V3D_CT1CS = 0x00000120
    V3D_CT0EA = 0xd7d330ba
    V3D_CT1EA = 0xd7d33a5b
    V3D_CT0CA = 0xd7d3208c
    V3D_CT1CA = 0xd85085a9
  V3D_CT00RA0 = 0x00000000
  V3D_CT01RA0 = 0xd7dccac1
    V3D_CT0LC = 0x00000000
    V3D_CT1LC = 0x00000000
    V3D_CT0PC = 0x00000000
    V3D_CT1PC = 0x00000000
      V3D_PCS = 0x00000005
      V3D_BFC = 0x000000b8
      V3D_RFC = 0x000000f2
     V3D_BPCA = 0xd8506e00
     V3D_BPCS = 0x00079000
     V3D_BPOA = 0xd8300000
     V3D_BPOS = 0x00000000
     V3D_BXCF = 0x00000000
   V3D_SQRSV0 = 0x00000000
   V3D_SQRSV1 = 0x00000000
   V3D_SQCNTL = 0x00000000
    V3D_SRQPC = 0x00000000
    V3D_SRQUA = 0x00000000
    V3D_SRQUL = 0x00000000
    V3D_SRQCS = 0x00000000
  V3D_VPACNTL = 0x00000000
  V3D_VPMBASE = 0x00000000
    V3D_PCTRC = 0x00000000
    V3D_PCTRE = 0x00000000
  V3D_PCTR(0) = 0x00000000
 V3D_PCTRS(0) = 0x00000000
  V3D_PCTR(1) = 0x00000000
 V3D_PCTRS(1) = 0x00000000
  V3D_PCTR(2) = 0x00000000
 V3D_PCTRS(2) = 0x00000000
  V3D_PCTR(3) = 0x00000000
 V3D_PCTRS(3) = 0x00000000
  V3D_PCTR(4) = 0x00000000
 V3D_PCTRS(4) = 0x00000000
  V3D_PCTR(5) = 0x00000000
 V3D_PCTRS(5) = 0x00000000
  V3D_PCTR(6) = 0x00000000
 V3D_PCTRS(6) = 0x00000000
  V3D_PCTR(7) = 0x00000000
 V3D_PCTRS(7) = 0x00000000
  V3D_PCTR(8) = 0x00000000
 V3D_PCTRS(8) = 0x00000000
  V3D_PCTR(9) = 0x00000000
 V3D_PCTRS(9) = 0x00000000
 V3D_PCTR(10) = 0x00000000
V3D_PCTRS(10) = 0x00000000
 V3D_PCTR(11) = 0x00000000
V3D_PCTRS(11) = 0x00000000
 V3D_PCTR(12) = 0x00000000
V3D_PCTRS(12) = 0x00000000
 V3D_PCTR(13) = 0x00000000
V3D_PCTRS(13) = 0x00000000
 V3D_PCTR(14) = 0x00000000
V3D_PCTRS(14) = 0x00000000
 V3D_PCTR(15) = 0x00000000
V3D_PCTRS(15) = 0x00000000
     V3D_DBGE = 0x00000000
    V3D_FDBGO = 0x00000006
    V3D_FDBGB = 0x1c00000d
    V3D_FDBGR = 0x595d68fd
    V3D_FDBGS = 0x1246d1ef
  V3D_ERRSTAT = 0x00007000
# cat /sys/kernel/debug/dri/0/v3d_ident 
Revision:   1
Slices:     3
TMUs:       6
QPUs:       12
Semaphores: 16
# cat /sys/kernel/debug/dri/0/bo_stats 
                           V3D:  67368kb BOs (77)
                    V3D shader:     96kb BOs (24)
                          dumb:   4052kb BOs (1)
                        binner:  16384kb BOs (1)
                           RCL:     48kb BOs (8)
                           BCL:     32kb BOs (8)
            userspace BO cache:   1028kb BOs (2)

@6by9 is there anything I can provide to get a more clear picture of what is going on?

6by9 commented 5 months ago

Sorry, I know very little of the 3D pipeline. 6.6 is now the standard kernel branch, although KingZambo reported above a reset on 6.6.20.

@popcornmix possibly, or @melissawen for more knowledge on 3D.

RapidEdwin08 commented 5 months ago

I can confirm this issue on latest RaspiOS + kernel on pretty much anything below raspberry pi4 using full v3d KMS. Can easily reproduce in Emulationstation by entering any system and quickly backing out again, usually x3 tries will lead to the issue , dmesg shows [drm] Resetting GPU.

melissawen commented 4 months ago

@mairacanal is following the 3D work more closely than I am.

mairacanal commented 4 months ago

I've started to investigate this issue and any report explaining how to reproduce the error, kernel version, Mesa version, and attaching the debugfs files are really helpful.

Thanks for all the reports here (and thanks @melissawen for pointing me to this issue)!

beudbeud commented 4 months ago

i have the same issue with the Raspberry Pi 0 2w

I use Recalbox 9.2 kernel : 6.1.77-v7 mesa3d: 22.3.4

david-barbion commented 4 months ago

Hi @mairacanal.

First thank you for the investigation. I have this issue on RPI02w running kernel 6.1.77 and the matching RPi firmware. Mesa version is 22.3.4. I have this problem when using emulationstation, a well known SDL2 frontend application that serves as a launcher for (old) game emulators. I'm trying to found an easier way to reproduce the bug.

RapidEdwin08 commented 4 months ago

@mairacanal I'd be happy to give you the link for a bare minimum IMG for a Raspberry Pi of your choice that you can simply burn and reproduce the issue if you let me know what Pi you'd prefer, Pi0/1, Pi3, Pi Zero2W ect

lentomajava commented 3 months ago

Hi,

Repeats several times during the week Raspberry Pi 3 Model B Plus Rev 1.3, Raspbian GNU/Linux 11 (bullseye), 6.6.21-v7+ The machine runs in kiosk mode, and displays Chart.js graphs with chromium-browser version 124.0.6367.73-rpt1

# lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
# vcgencmd version
Feb 29 2024 12:26:47
Copyright (c) 2012 Broadcom
version f4e2138c2adc8f3a92a3a65939e458f11d7298ba (clean) (release) (start)
# cat /proc/version
Linux version 6.6.21-v7+ (dom@buildbot) (arm-linux-gnueabihf-gcc-8 (Ubuntu/Linaro 8.4.0-3ubuntu1) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1743 SMP Thu Mar 14 11:35:08 GMT 2024
# cat /proc/device-tree/model
Raspberry Pi 3 Model B Plus Rev 1.3
# cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 11 (bullseye)"
NAME="Raspbian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
# vcdbg log msg
Failed to allocate -201253077 bytes for message buffer
# cat /sys/kernel/debug/dri/0/v3d_regs
   V3D_IDENT0 = 0x02443356
   V3D_IDENT1 = 0xc1102431
   V3D_IDENT2 = 0x00000121
  V3D_SCRATCH = 0x00000000
  V3D_L2CACTL = 0x00000001
  V3D_SLCACTL = 0x00000000
   V3D_INTCTL = 0x00000000
   V3D_INTENA = 0x00000007
   V3D_INTDIS = 0x00000007
    V3D_CT0CS = 0x00007020
    V3D_CT1CS = 0x00000020
    V3D_CT0EA = 0xdeec70d1
    V3D_CT1EA = 0xdeec602d
    V3D_CT0CA = 0xdeec70d0
    V3D_CT1CA = 0xded1551a
  V3D_CT00RA0 = 0x00000000
  V3D_CT01RA0 = 0x00000000
    V3D_CT0LC = 0x00000000
    V3D_CT1LC = 0x00000000
    V3D_CT0PC = 0x00000000
    V3D_CT1PC = 0x00fce537
      V3D_PCS = 0x00000005
      V3D_BFC = 0x00000078
      V3D_RFC = 0x000000a0
     V3D_BPCA = 0xe0b85300
     V3D_BPCS = 0x0007a000
     V3D_BPOA = 0xe0b00000
     V3D_BPOS = 0x00000000
     V3D_BXCF = 0x00000000
   V3D_SQRSV0 = 0x00000000
   V3D_SQRSV1 = 0x00000000
   V3D_SQCNTL = 0x00000000
    V3D_SRQPC = 0x00000000
    V3D_SRQUA = 0x00000000
    V3D_SRQUL = 0x00000000
    V3D_SRQCS = 0x00000000
  V3D_VPACNTL = 0x00000000
  V3D_VPMBASE = 0x00000000
    V3D_PCTRC = 0x00000000
    V3D_PCTRE = 0x00000000
  V3D_PCTR(0) = 0x00000000
 V3D_PCTRS(0) = 0x00000000
  V3D_PCTR(1) = 0x00000000
 V3D_PCTRS(1) = 0x00000000
  V3D_PCTR(2) = 0x00000000
 V3D_PCTRS(2) = 0x00000000
  V3D_PCTR(3) = 0x00000000
 V3D_PCTRS(3) = 0x00000000
  V3D_PCTR(4) = 0x00000000
 V3D_PCTRS(4) = 0x00000000
  V3D_PCTR(5) = 0x00000000
 V3D_PCTRS(5) = 0x00000000
  V3D_PCTR(6) = 0x00000000
 V3D_PCTRS(6) = 0x00000000
  V3D_PCTR(7) = 0x00000000
 V3D_PCTRS(7) = 0x00000000
  V3D_PCTR(8) = 0x00000000
 V3D_PCTRS(8) = 0x00000000
  V3D_PCTR(9) = 0x00000000
 V3D_PCTRS(9) = 0x00000000
 V3D_PCTR(10) = 0x00000000
V3D_PCTRS(10) = 0x00000000
 V3D_PCTR(11) = 0x00000000
V3D_PCTRS(11) = 0x00000000
 V3D_PCTR(12) = 0x00000000
V3D_PCTRS(12) = 0x00000000
 V3D_PCTR(13) = 0x00000000
V3D_PCTRS(13) = 0x00000000
 V3D_PCTR(14) = 0x00000000
V3D_PCTRS(14) = 0x00000000
 V3D_PCTR(15) = 0x00000000
V3D_PCTRS(15) = 0x00000000
     V3D_DBGE = 0x00000000
    V3D_FDBGO = 0x00000006
    V3D_FDBGB = 0x10000000
    V3D_FDBGR = 0x10006800
    V3D_FDBGS = 0x00001147
  V3D_ERRSTAT = 0x00007000
# cat /sys/kernel/debug/dri/0/v3d_ident
Revision:   1
Slices:     3
TMUs:       6
QPUs:       12
Semaphores: 16
# cat /sys/kernel/debug/dri/0/bo_stats
               V3D:  35240kb BOs (27)
        V3D shader:    144kb BOs (36)
              dumb:  12168kb BOs (3)
            binner:  16384kb BOs (1)
               RCL:     44kb BOs (9)
               BCL:     40kb BOs (9)
userspace BO cache:   3504kb BOs (13)

# cat /var/log/syslog
May 14 08:21:33 <name> kernel: [85463.717960] [drm] Resetting GPU.
May 14 08:21:34 <name> kernel: [85464.757986] [drm] Resetting GPU.
May 14 08:21:35 <name> kernel: [85465.717977] [drm] Resetting GPU.
May 14 08:21:36 <name> kernel: [85466.758010] [drm] Resetting GPU.
May 14 08:21:37 <name> kernel: [85467.717930] [drm] Resetting GPU.
May 14 08:21:38 <name> kernel: [85468.758008] [drm] Resetting GPU.
May 14 08:21:39 <name> kernel: [85469.717938] [drm] Resetting GPU.
May 14 08:21:40 <name> kernel: [85470.758021] [drm] Resetting GPU.
May 14 08:21:41 <name> kernel: [85471.717918] [drm] Resetting GPU.
carlonluca commented 2 months ago

Hello, I'm the original author of the report #3221. In 2019 I could only solve the issue by not using KMS and keep using the old legacy stack on the rpi3, which worked well. I recently tried to update the system, but what I found is that the issue is still there and makes the system crash in minutes.

May I ask if someone is actually working on this issue? Are there the resources to fix it? Cause if it is unlikely to be fixed in rpi3, the other option is to go back to the legacy stack once again and remain there. That would require a considerable amount of hours for porting software and for the maintenance of two different systems, so before I do, will the legacy stack remain supported in rpi3 in newer kernels? Should we consider that the reasonable solution to this long-standing issue?

Thanks.

SilverGreen93 commented 2 months ago

Hi! I was pointed out that the same bug is reproducible very easy on Recalbox 9.2.1 with Gpi case 2w. Here is the original report for more info: https://forum.recalbox.com/topic/31359/gpicase-2w-pi-zero-2w-screen-glitch I can reproduce myself quite often.

carlonluca commented 2 months ago

Hello, I invested some time on this issue. This patch https://github.com/raspberrypi/linux/pull/6239 prevents rendering issues after the GPU is reset. So something still causes the GPU to reset, I did not investigate that, but when it happens, proper behavior is restored. Result is just a short interruption of the rendering. May be tolerable in some cases. Please note that I only tested this in kernel 6.1 and 5.15 respectively on a Yocto based system and a old Raspbian system. I cannot reproduce that problem so far.

david-barbion commented 2 months ago

I can confirm that workaround you provide works as expected. It does not prevent the GPU reset but there is no more screen distortion.

mairacanal commented 5 days ago

Hey guys, I was out for a while due to some health issues, but I'm back to this issue once more. I only have an RPi 3B+ available here and what I did to reproduce the error was to install Recalbox 9.2.3-Pulstar and try to play some games. Unfortunately, I didn't manage to reproduce the error while playing the games.

Any chance you guys could provide me with more information about how to reproduce the error? For example, which game were you playing when you hit the error? How much time were you playing (5 minutes, 15 minutes, an hour)?

I worked on a solution for this issue, but I can't confirm it fixes the issue if I can reproduce the error at first.

Thank you guys for the feedback!

david-barbion commented 5 days ago

Hi Maira! Hope you are doing well!

Recalbox 9.2.3 contains the workaround pointed by @carlonluca here https://github.com/raspberrypi/linux/issues/5780#issuecomment-2189594054. That's why you won't reproduce the bug (however, the GPU error still spawns in /var/log/messages). As said, this is only a workaround and if you can track the real issue, that would be great :)

I'm currently looking for a way to trig the bug.