raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.02k stars 4.95k forks source link

v4l2h264dec with GStreamer, video hangs with bcm2835-codec: bcm2835_codec_stop_streaming: Timeout waiting for buffers to be returned - 6 outstanding #4592

Open hailfinger opened 2 years ago

hailfinger commented 2 years ago

Describe the bug Using the v4l2h264dec decoder in Raspberry Pi OS 64 bit with GStreamer 1.19, the video will reproducibly slow down extremely/hang after 20-80 minutes on some Raspberry Pi 4B 4GB machines. The machines exhibiting the hang will do so reproducibly, but the affected machines are only a small percentage of my 100+ devices pool of Raspberry PI 4B running 64-bit Raspberry Pi OS. GStreamer will start emitting QoS messages and a few seconds to minutes later the video will freeze (I haven't seen it continue, but maybe the frame rate is just extremely low). If you stop GStreamer in this state, the kernel will emit the timeout message in the logs down below.

To reproduce Launch an X server, configure for 720p50 on first HDMI output, then run GStreamer 1.19 with gst-launch-1.0 -m souphttpsrc name=videosrc is-live=true location=https://stream.iptv.nak.org/teststream/high/index.m3u8 ! hlsdemux ! queue ! tsdemux name=mux ! tee name=vid ! queue ! h264parse ! queue min-threshold-buffers=3 ! v4l2h264dec ! queue min-threshold-buffers=3 ! glupload ! glcolorconvert ! glcolorscale ! video/x-raw(memory:GLMemory),width=1280,height=720 ! glimagesinkelement sync=true

Expected behaviour Video continues playing for hours (video is an endless loop).

Actual behaviour Video hangs after 20-80 minutes.

System root@pi-dortmund-eving:~# raspinfo System Information

Raspberry Pi 4 Model B Rev 1.1 PRETTY_NAME="Debian GNU/Linux 10 (buster)" NAME="Debian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)"

Raspberry Pi reference 2021-05-07 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 071d3956353f2be964d2d337d3b123f1872b7f53, stage2

Linux pi-dortmund-eving 5.10.60-v8+ #1449 SMP PREEMPT Wed Aug 25 15:01:33 BST 2021 aarch64 GNU/Linux Revision : c03111 Serial : 1000000069ed6440 Model : Raspberry Pi 4 Model B Rev 1.1 Throttled flag : throttled=0x0 Camera : supported=1 detected=0

Videocore information

Aug 31 2021 14:53:42 Copyright (c) 2012 Broadcom version 67615e950e1e28b92dfae6303cf7a8b879a8908f (clean) (release) (start_x)

alloc failures: 0 compactions: 0 legacy block fails: 0

Filesystem information

Filesystem 1K-blocks Used Available Use% Mounted on /dev/root 8191416 3612768 4142836 47% / devtmpfs 1685344 0 1685344 0% /dev tmpfs 1851072 440 1850632 1% /dev/shm tmpfs 1851072 512 1850560 1% /run tmpfs 5120 0 5120 0% /run/lock tmpfs 1851072 0 1851072 0% /sys/fs/cgroup /dev/mmcblk0p3 258095 30337 227759 12% /boot tmpfs 370212 4 370208 1% /run/user/0

Filename Type Size Used Priority /var/swap file 102396 0 -2

Package version information

raspberrypi-ui-mods: Installed: (none) raspberrypi-sys-mods: Installed: 20210901 openbox: Installed: (none) lxpanel: Installed: (none) pcmanfm: Installed: (none) rpd-plym-splash: Installed: (none)

Networking Information

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet x.x.x.x netmask x.x.x.x broadcast x.x.x.x inet6 y.y.y.y.y.y.y.y prefixlen 64 scopeid 0x0 inet6 y.y.y.y.y.y.y.y prefixlen 64 scopeid 0x0 inet6 y::y.y.y.y prefixlen 64 scopeid 0x20 ether m.m.m.m txqueuelen 1000 (Ethernet) RX packets 581257 bytes 725438864 (691.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 323782 bytes 119525053 (113.9 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet x.x.x.x netmask x.x.x.x inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 1227543 bytes 645074295 (615.1 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1227543 bytes 645074295 (615.1 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

USB Information

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M | Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M | Port 1: Dev 3, If 0, Class=Vendor Specific Class, Driver=, 480M | Port 1: Dev 3, If 1, Class=Vendor Specific Class, Driver=cx231xx, 480M | Port 1: Dev 3, If 2, Class=Vendor Specific Class, Driver=, 480M | Port 1: Dev 3, If 3, Class=Vendor Specific Class, Driver=, 480M | Port 1: Dev 3, If 4, Class=Vendor Specific Class, Driver=, 480M |__ Port 1: Dev 3, If 5, Class=Vendor Specific Class, Driver=, 480M

config.txt

arm_64bit=1 arm_freq=1500 audio_pwm_mode=514 config_hdmi_boost=5 core_freq=500 core_freq_min=200 disable_commandline_tags=2 disable_l2cache=1 disable_overscan=1 disable_splash=1 display_hdmi_rotate=-1 display_lcd_rotate=-1 dvfs=3 enable_gic=1 enable_uart=1 force_eeprom_read=1 force_pwm_open=1 framebuffer_height=720 framebuffer_ignore_alpha=1 framebuffer_swap=1 framebuffer_width=1280 gpu_freq=500 gpu_freq_min=250 init_uart_clock=0x2dc6c00 lcd_framerate=60 mask_gpu_interrupt0=1024 mask_gpu_interrupt1=0x10000 max_framebuffers=2 over_voltage_avs=-23750 pause_burst_frames=1 program_serial_random=1 second_boot=1 total_mem=4096 hdmi_force_cec_address:0=65535 hdmi_force_cec_address:1=65535 hdmi_force_hotplug:0=1 hdmi_group:0=1 hdmi_mode:0=19 hdmi_pixel_freq_limit:0=0x11e1a300 hdmi_pixel_freq_limit:1=0x11e1a300 device_tree=- overlay_prefix=overlays/ hdmi_cvt:0= hdmi_cvt:1= hdmi_edid_filename:0= hdmi_edid_filename:1= hdmi_timings:0= hdmi_timings:1=

cmdline.txt

coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 video=HDMI-A-1:1280x720M@50 smsc95xx.macaddr=DC:A6:32:48:BA:3F vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000 console=tty1 root=PARTUUID=ad8405fc-04 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait

raspi-gpio settings

BANK0 (GPIO 0 to 27): GPIO 0: level=1 fsel=0 func=INPUT pull=UP GPIO 1: level=1 fsel=0 func=INPUT pull=UP GPIO 2: level=1 fsel=4 alt=0 func=SDA1 pull=UP GPIO 3: level=1 fsel=4 alt=0 func=SCL1 pull=UP GPIO 4: level=1 fsel=0 func=INPUT pull=UP GPIO 5: level=1 fsel=0 func=INPUT pull=UP GPIO 6: level=1 fsel=0 func=INPUT pull=UP GPIO 7: level=1 fsel=1 func=OUTPUT pull=UP GPIO 8: level=1 fsel=1 func=OUTPUT pull=UP GPIO 9: level=0 fsel=4 alt=0 func=SPI0_MISO pull=DOWN GPIO 10: level=0 fsel=4 alt=0 func=SPI0_MOSI pull=DOWN GPIO 11: level=0 fsel=4 alt=0 func=SPI0_SCLK pull=DOWN GPIO 12: level=0 fsel=0 func=INPUT pull=DOWN GPIO 13: level=0 fsel=0 func=INPUT pull=DOWN GPIO 14: level=1 fsel=4 alt=0 func=TXD0 pull=NONE GPIO 15: level=1 fsel=4 alt=0 func=RXD0 pull=UP GPIO 16: level=0 fsel=0 func=INPUT pull=DOWN GPIO 17: level=0 fsel=0 func=INPUT pull=DOWN GPIO 18: level=0 fsel=0 func=INPUT pull=DOWN GPIO 19: level=0 fsel=0 func=INPUT pull=DOWN GPIO 20: level=0 fsel=0 func=INPUT pull=DOWN GPIO 21: level=0 fsel=0 func=INPUT pull=DOWN GPIO 22: level=0 fsel=0 func=INPUT pull=DOWN GPIO 23: level=0 fsel=0 func=INPUT pull=DOWN GPIO 24: level=0 fsel=0 func=INPUT pull=DOWN GPIO 25: level=0 fsel=0 func=INPUT pull=DOWN GPIO 26: level=0 fsel=0 func=INPUT pull=DOWN GPIO 27: level=0 fsel=0 func=INPUT pull=DOWN BANK1 (GPIO 28 to 45): GPIO 28: level=1 fsel=2 alt=5 func=RGMII_MDIO pull=UP GPIO 29: level=0 fsel=2 alt=5 func=RGMII_MDC pull=DOWN GPIO 30: level=1 fsel=0 func=INPUT pull=UP GPIO 31: level=1 fsel=0 func=INPUT pull=NONE GPIO 32: level=0 fsel=0 func=INPUT pull=DOWN GPIO 33: level=0 fsel=0 func=INPUT pull=DOWN GPIO 34: level=1 fsel=0 func=INPUT pull=UP GPIO 35: level=1 fsel=0 func=INPUT pull=UP GPIO 36: level=1 fsel=0 func=INPUT pull=UP GPIO 37: level=1 fsel=0 func=INPUT pull=UP GPIO 38: level=1 fsel=0 func=INPUT pull=UP GPIO 39: level=1 fsel=0 func=INPUT pull=UP GPIO 40: level=0 fsel=4 alt=0 func=PWM1_0 pull=NONE GPIO 41: level=0 fsel=4 alt=0 func=PWM1_1 pull=NONE GPIO 42: level=0 fsel=1 func=OUTPUT pull=UP GPIO 43: level=1 fsel=0 func=INPUT pull=UP GPIO 44: level=1 fsel=0 func=INPUT pull=UP GPIO 45: level=1 fsel=0 func=INPUT pull=UP BANK2 (GPIO 46 to 53): GPIO 46: level=0 fsel=0 func=INPUT pull=UP GPIO 47: level=0 fsel=0 func=INPUT pull=UP GPIO 48: level=0 fsel=0 func=INPUT pull=DOWN GPIO 49: level=0 fsel=0 func=INPUT pull=DOWN GPIO 50: level=0 fsel=0 func=INPUT pull=DOWN GPIO 51: level=0 fsel=0 func=INPUT pull=DOWN GPIO 52: level=0 fsel=0 func=INPUT pull=DOWN GPIO 53: level=0 fsel=0 func=INPUT pull=DOWN

vcdbg log messages

sudo: vcdbg: command not found

dmesg log

[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083] [ 0.000000] Linux version 5.10.60-v8+ (dom@buildbot) (aarch64-linux-gnu-gcc-8 (Ubuntu/Linaro 8.4.0-3ubuntu1) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1449 SMP PREEMPT Wed Aug 25 15:01:33 BST 2021 [ 0.000000] random: fast init done [ 0.000000] Machine model: Raspberry Pi 4 Model B Rev 1.1 [ 0.000000] efi: UEFI not found. [ 0.000000] Reserved memory: created CMA memory pool at 0x000000001ac00000, size 320 MiB [ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000000000000-0x000000003fffffff] [ 0.000000] DMA32 [mem 0x0000000040000000-0x00000000fbffffff] [ 0.000000] Normal empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000000000-0x000000002fffffff] [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000fbffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000000fbffffff] [ 0.000000] On node 0 totalpages: 966656 [ 0.000000] DMA zone: 3072 pages used for memmap [ 0.000000] DMA zone: 0 pages reserved [ 0.000000] DMA zone: 196608 pages, LIFO batch:63 [ 0.000000] DMA32 zone: 12032 pages used for memmap [ 0.000000] DMA32 zone: 770048 pages, LIFO batch:63 [ 0.000000] percpu: Embedded 32 pages/cpu s91224 r8192 d31656 u131072 [ 0.000000] pcpu-alloc: s91224 r8192 d31656 u131072 alloc=32*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] CPU features: detected: Spectre-v2 [ 0.000000] CPU features: detected: Spectre-v4 [ 0.000000] CPU features: detected: ARM errata 1165522, 1319367, or 1530923 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 951552 [ 0.000000] Kernel command line: coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 video=HDMI-A-1:1280x720M@50 smsc95xx.macaddr=m.m.m.m vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000 console=tty1 root=PARTUUID=ad8405fc-04 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait [ 0.000000] Kernel parameter elevator= does not have any effect anymore. Please use sysfs to set IO scheduler for individual devices. [ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] software IO TLB: mapped [mem 0x0000000016c00000-0x000000001ac00000] (64MB) [ 0.000000] Memory: 3370688K/3866624K available (11136K kernel code, 1946K rwdata, 3984K rodata, 3776K init, 1260K bss, 168256K reserved, 327680K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] ftrace: allocating 36750 entries in 144 pages [ 0.000000] ftrace: allocated 144 pages with 2 groups [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU event tracing is enabled. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4. [ 0.000000] Trampoline variant of Tasks RCU enabled. [ 0.000000] Rude variant of Tasks RCU enabled. [ 0.000000] Tracing variant of Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] random: get_random_bytes called from start_kernel+0x3b0/0x570 with crng_init=1 [ 0.000000] arch_timer: cp15 timer(s) running at 54.00MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns [ 0.000006] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 4398046511102nsroot@pi-dortmund-eving:~# [ 0.000255] Console: colour dummy device 80x25 [ 0.000796] printk: console [tty1] enabled [ 0.000859] Calibrating delay loop (skipped), value calculated using timer frequency.. 108.00 BogoMIPS (lpj=216000) [ 0.000900] pid_max: default: 32768 minimum: 301 [ 0.001054] LSM: Security Framework initializing [ 0.001299] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) [ 0.001378] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) [ 0.002698] cgroup: Disabling memory control group subsystem [ 0.005086] rcu: Hierarchical SRCU implementation. [ 0.006146] EFI services will not be available. [ 0.006681] smp: Bringing up secondary CPUs ... [ 0.007680] Detected PIPT I-cache on CPU1 [ 0.007759] CPU1: Booted secondary processor 0x0000000001 [0x410fd083] [ 0.008875] Detected PIPT I-cache on CPU2 [ 0.008924] CPU2: Booted secondary processor 0x0000000002 [0x410fd083] [ 0.010015] Detected PIPT I-cache on CPU3 [ 0.010063] CPU3: Booted secondary processor 0x0000000003 [0x410fd083] [ 0.010214] smp: Brought up 1 node, 4 CPUs [ 0.010302] SMP: Total of 4 processors activated. [ 0.010326] CPU features: detected: 32-bit EL0 Support [ 0.010349] CPU features: detected: CRC32 instructions [ 0.010372] CPU features: detected: 32-bit EL1 Support [ 0.044097] CPU: All CPU(s) started at EL2 [ 0.044179] alternatives: patching kernel code [ 0.045766] devtmpfs: initialized [ 0.056606] Enabled cp15_barrier support [ 0.056671] Enabled setend support [ 0.056886] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.056932] futex hash table entries: 1024 (order: 4, 65536 bytes, linear) [ 0.065795] pinctrl core: initialized pinctrl subsystem [ 0.066602] DMI not present or invalid. [ 0.066992] NET: Registered protocol family 16 [ 0.070159] DMA: preallocated 1024 KiB GFP_KERNEL pool for atomic allocations [ 0.070444] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations [ 0.071378] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations [ 0.071489] audit: initializing netlink subsys (disabled) [ 0.071902] audit: type=2000 audit(0.068:1): state=initialized audit_enabled=0 res=1 [ 0.072434] thermal_sys: Registered thermal governor 'step_wise' [ 0.072894] cpuidle: using governor menu [ 0.073261] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 0.073446] ASID allocator initialised with 65536 entries [ 0.073609] Serial: AMBA PL011 UART driver [ 0.105244] bcm2835-mbox fe00b880.mailbox: mailbox enabled [ 0.112363] raspberrypi-firmware soc:firmware: Attached to firmware from 2021-08-31T14:53:42, variant start_x [ 0.116373] raspberrypi-firmware soc:firmware: Firmware hash is 67615e950e1e28b92dfae6303cf7a8b879a8908f [ 0.150514] bcm2835-dma fe007000.dma: DMA legacy API manager, dmachans=0x1 [ 0.154431] vgaarb: loaded [ 0.154871] SCSI subsystem initialized [ 0.155104] usbcore: registered new interface driver usbfs [ 0.155171] usbcore: registered new interface driver hub [ 0.155262] usbcore: registered new device driver usb [ 0.155592] usb_phy_generic phy: supply vcc not found, using dummy regulator [ 0.157501] clocksource: Switched to clocksource arch_sys_counter [ 1.188723] VFS: Disk quotas dquot_6.6.0 [ 1.188844] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 1.189020] FS-Cache: Loaded [ 1.189265] CacheFiles: Loaded [ 1.190405] simple-framebuffer 3e843000.framebuffer: framebuffer at 0x3e843000, 0x384000 bytes, mapped to 0x(ptrval) [ 1.190441] simple-framebuffer 3e843000.framebuffer: format=a8r8g8b8, mode=1280x720x32, linelength=5120 [ 1.195862] Console: switching to colour frame buffer device 160x45 [ 1.200591] simple-framebuffer 3e843000.framebuffer: fb0: simplefb registered! [ 1.210894] NET: Registered protocol family 2 [ 1.211475] IP idents hash table entries: 65536 (order: 7, 524288 bytes, linear) [ 1.215060] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear) [ 1.215202] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 1.215497] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear) [ 1.216096] TCP: Hash tables configured (established 32768 bind 32768) [ 1.216411] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear) [ 1.216552] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear) [ 1.216952] NET: Registered protocol family 1 [ 1.217924] RPC: Registered named UNIX socket transport module. [ 1.218002] RPC: Registered udp transport module. [ 1.218062] RPC: Registered tcp transport module. [ 1.218120] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 1.218204] PCI: CLS 0 bytes, default 64 [ 1.222408] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available [ 1.224608] kvm [1]: IPA Size Limit: 44 bits [ 1.227670] kvm [1]: vgic interrupt IRQ9 [ 1.229864] kvm [1]: Hyp mode initialized successfully [ 1.233958] Initialise system trusted keyrings [ 1.236122] workingset: timestamp_bits=46 max_order=20 bucket_order=0 [ 1.244394] zbud: loaded [ 1.247967] FS-Cache: Netfs 'nfs' registered for caching [ 1.250423] NFS: Registering the id_resolver key type [ 1.252225] Key type id_resolver registered [ 1.254060] Key type id_legacy registered [ 1.255921] nfs4filelayout_init: NFSv4 File Layout Driver Registering... [ 1.257733] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver Registering... [ 1.260719] Key type asymmetric registered [ 1.262574] Asymmetric key parser 'x509' registered [ 1.264423] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249) [ 1.266555] io scheduler mq-deadline registered [ 1.268390] io scheduler kyber registered [ 1.274264] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges: [ 1.276148] brcm-pcie fd500000.pcie: No bus range found for /scb/pcie@7d500000, using [bus 00-ff] [ 1.278082] brcm-pcie fd500000.pcie: MEM 0x0600000000..0x063fffffff -> 0x00c0000000 [ 1.279993] brcm-pcie fd500000.pcie: IB MEM 0x0000000000..0x00bfffffff -> 0x0400000000 [ 1.331599] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC) [ 1.333755] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00 [ 1.335514] pci_bus 0000:00: root bus resource [bus 00-ff] [ 1.337253] pci_bus 0000:00: root bus resource [mem 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff]) [ 1.339170] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400 [ 1.341152] pci 0000:00:00.0: PME# supported from D0 D3hot [ 1.347073] pci 0000:01:00.0: [1106:3483] type 00 class 0x0c0330 [ 1.348883] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00000fff 64bit] [ 1.351011] pci 0000:01:00.0: PME# supported from D0 D3cold [ 1.364981] pci 0000:00:00.0: BAR 8: assigned [mem 0x600000000-0x6000fffff] [ 1.366679] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x600000fff 64bit] [ 1.368338] pci 0000:00:00.0: PCI bridge to [bus 01] [ 1.369973] pci 0000:00:00.0: bridge window [mem 0x600000000-0x6000fffff] [ 1.377208] Serial: 8250/16550 driver, 1 ports, IRQ sharing enabled [ 1.381782] iproc-rng200 fe104000.rng: hwrng registered [ 1.383796] vc-mem: phys_addr:0x00000000 mem_base=0x3ec00000 mem_size:0x40000000(1024 MiB) [ 1.386415] gpiomem-bcm2835 fe200000.gpiomem: Initialised: Registers at 0xfe200000 [ 1.388552] cacheinfo: Unable to detect cache hierarchy for CPU 0 [ 1.401407] brd: module loaded [ 1.415234] loop: module loaded [ 1.418981] Loading iSCSI transport class v2.0-870. [ 1.423890] libphy: Fixed MDIO Bus: probed [ 1.427227] bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000 [ 1.441530] libphy: bcmgenet MII bus: probed [ 1.477784] unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus [ 1.480614] usbcore: registered new interface driver r8152 [ 1.482474] usbcore: registered new interface driver lan78xx [ 1.484200] usbcore: registered new interface driver smsc95xx [ 1.487517] pci 0000:00:00.0: enabling device (0000 -> 0002) [ 1.489202] xhci_hcd 0000:01:00.0: enabling device (0000 -> 0002) [ 1.491022] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.492672] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 1.495072] xhci_hcd 0000:01:00.0: hcc params 0x002841eb hci version 0x100 quirks 0x0000060000000890 [ 1.498071] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.10 [ 1.499735] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 1.501401] usb usb1: Product: xHCI Host Controller [ 1.503067] usb usb1: Manufacturer: Linux 5.10.60-v8+ xhci-hcd [ 1.504706] usb usb1: SerialNumber: 0000:01:00.0 [ 1.507066] hub 1-0:1.0: USB hub found [ 1.508782] hub 1-0:1.0: 1 port detected [ 1.511056] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.512686] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 1.514356] xhci_hcd 0000:01:00.0: Host supports USB 3.0 SuperSpeed [ 1.516390] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.10 [ 1.518059] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 1.519669] usb usb2: Product: xHCI Host Controller [ 1.521268] usb usb2: Manufacturer: Linux 5.10.60-v8+ xhci-hcd [ 1.522913] usb usb2: SerialNumber: 0000:01:00.0 [ 1.525172] hub 2-0:1.0: USB hub found [ 1.526886] hub 2-0:1.0: 4 ports detected [ 1.530071] dwc_otg: version 3.00a 10-AUG-2012 (platform bus) [ 1.531954] dwc_otg: FIQ enabled [ 1.531965] dwc_otg: NAK holdoff enabled [ 1.531975] dwc_otg: FIQ split-transaction FSM enabled [ 1.531989] Module dwc_common_port init [ 1.532543] usbcore: registered new interface driver uas [ 1.534459] usbcore: registered new interface driver usb-storage [ 1.536257] mousedev: PS/2 mouse device common for all mice [ 1.539808] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer [ 1.545095] sdhci: Secure Digital Host Controller Interface driver [ 1.546742] sdhci: Copyright(c) Pierre Ossman [ 1.548814] sdhci-pltfm: SDHCI platform and OF driver helper [ 1.554514] ledtrig-cpu: registered to indicate activity on CPUs [ 1.556442] hid: raw HID events driver (C) Jiri Kosina [ 1.558235] usbcore: registered new interface driver usbhid [ 1.559789] usbhid: USB HID core driver [ 1.561679] ashmem: initialized [ 1.571687] Initializing XFRM netlink socket [ 1.573274] NET: Registered protocol family 17 [ 1.574941] Key type dns_resolver registered [ 1.577160] registered taskstats version 1 [ 1.578723] Loading compiled-in X.509 certificates [ 1.580739] Key type ._fscrypt registered [ 1.582289] Key type .fscrypt registered [ 1.583783] Key type fscrypt-provisioning registered [ 1.596150] uart-pl011 fe201000.serial: there is not valid maps for state default [ 1.597971] uart-pl011 fe201000.serial: cts_event_workaround enabled [ 1.599561] fe201000.serial: ttyAMA0 at MMIO 0xfe201000 (irq = 21, base_baud = 0) is a PL011 rev2 [ 1.607951] bcm2835-power bcm2835-power: Broadcom BCM2835 power domains driver [ 1.612653] of_cfs_init [ 1.614385] of_cfs_init: OK [ 1.648394] mmc0: SDHCI controller on fe340000.mmc [fe340000.mmc] using ADMA [ 1.651044] Waiting for root device PARTUUID=ad8405fc-04... [ 1.748085] mmc0: new ultra high speed DDR50 SDHC card at address 0001 [ 1.750657] mmcblk0: mmc0:0001 JB1RT 29.8 GiB [ 1.754526] mmcblk0: p1 p2 p3 p4 [ 1.765534] usb 1-1: new high-speed USB device number 2 using xhci_hcd [ 1.774576] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: (null) [ 1.776290] VFS: Mounted root (ext4 filesystem) readonly on device 179:4. [ 1.783434] devtmpfs: mounted [ 1.794533] Freeing unused kernel memory: 3776K [ 1.796342] Run /sbin/init as init process [ 1.797968] with arguments: [ 1.797978] /sbin/init [ 1.797988] with environment: [ 1.797999] HOME=/ [ 1.798008] TERM=linux [ 1.920206] usb 1-1: New USB device found, idVendor=2109, idProduct=3431, bcdDevice= 4.21 [ 1.922005] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0 [ 1.923683] usb 1-1: Product: USB2.0 Hub [ 1.927351] hub 1-1:1.0: USB hub found [ 1.929293] hub 1-1:1.0: 4 ports detected [ 2.225554] usb 1-1.1: new high-speed USB device number 3 using xhci_hcd [ 2.229349] systemd[1]: System time before build time, advancing clock. [ 2.315732] NET: Registered protocol family 10 [ 2.319181] Segment Routing with IPv6 [ 2.328473] usb 1-1.1: New USB device found, idVendor=2040, idProduct=c200, bcdDevice=40.01 [ 2.330304] usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 2.332054] usb 1-1.1: Product: Hauppauge Device [ 2.333773] usb 1-1.1: Manufacturer: Hauppauge [ 2.335457] usb 1-1.1: SerialNumber: 0014089511 [ 2.388228] systemd[1]: systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid) [ 2.392794] systemd[1]: Detected architecture arm64. [ 2.428518] systemd[1]: Set hostname to . [ 2.967328] random: systemd: uninitialized urandom read (16 bytes read) [ 2.976945] random: systemd: uninitialized urandom read (16 bytes read) [ 2.979960] systemd[1]: Listening on udev Control Socket. [ 2.984430] random: systemd: uninitialized urandom read (16 bytes read) [ 2.986723] systemd[1]: Started Dispatch Password Requests to Console Directory Watch. [ 2.991910] systemd[1]: Listening on Journal Socket. [ 2.996167] systemd[1]: Condition check resulted in Kernel Module supporting RPCSEC_GSS being skipped. [ 3.004711] systemd[1]: Mounting POSIX Message Queue File System... [ 3.011017] systemd[1]: Created slice system-systemd\x2dfsck.slice. [ 3.020337] systemd[1]: Starting nftables... [ 3.187235] cryptd: max_cpu_qlen set to 1000 [ 3.346714] i2c /dev entries driver [ 3.674250] EXT4-fs (mmcblk0p4): re-mounted. Opts: (null) [ 3.744153] systemd-journald[120]: Received request to flush runtime journal from PID 1 [ 3.750765] systemd-journald[120]: File /var/log/journal/f7c79b2a4d894e3ea97af6b3c6464d9c/system.journal corrupted or uncleanly shut down, renaming and replacing. [ 4.627691] rpivid-mem feb00000.hevc-decoder: rpivid-hevcmem initialised: Registers at 0xfeb00000 length 0x00010000 [ 4.628237] rpivid-mem feb10000.rpivid-local-intc: rpivid-intcmem initialised: Registers at 0xfeb10000 length 0x00001000 [ 4.628714] rpivid-mem feb20000.h264-decoder: rpivid-h264mem initialised: Registers at 0xfeb20000 length 0x00010000 [ 4.629180] rpivid-mem feb30000.vp9-decoder: rpivid-vp9mem initialised: Registers at 0xfeb30000 length 0x00010000 [ 4.709241] mc: Linux media interface: v0.10 [ 4.730111] vc_sm_cma: module is from the staging directory, the quality is unknown, you have been warned. [ 4.732151] bcm2835_vc_sm_cma_probe: Videocore shared memory driver

[ 4.747270] [vc_sm_connected_init]: installed successfully [ 4.763733] snd_bcm2835: module is from the staging directory, the quality is unknown, you have been warned. [ 4.782506] i2c i2c-22: Added multiplexed i2c bus 0 [ 4.785102] i2c i2c-22: Added multiplexed i2c bus 10 [ 4.786420] bcm2835_audio bcm2835_audio: card created with 4 channels [ 4.790219] videodev: Linux video capture interface: v2.00 [ 4.796407] bcm2835_audio bcm2835_audio: card created with 4 channels [ 4.840740] bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned. [ 4.843336] bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned. [ 4.849140] bcm2835_v4l2: module is from the staging directory, the quality is unknown, you have been warned. [ 4.851976] bcm2835_codec: module is from the staging directory, the quality is unknown, you have been warned. [ 4.862226] bcm2835_isp: module is from the staging directory, the quality is unknown, you have been warned. [ 5.067205] [drm] Initialized v3d 1.0.0 20180419 for fec00000.v3d on minor 0 [ 5.114401] cx231xx 1-1.1:1.1: New device Hauppauge Hauppauge Device @ 480 Mbps (2040:c200) with 6 interfaces [ 5.114619] cx231xx 1-1.1:1.1: can't change interface 3 alt no. to 3: Max. Pkt size = 0 [ 5.114634] cx231xx 1-1.1:1.1: Identified as Hauppauge USB Live 2 (card=9) [ 5.123766] i2c i2c-24: Added multiplexed i2c bus 26 [ 5.124215] i2c i2c-24: Added multiplexed i2c bus 27 [ 5.202529] checking generic (3e843000 384000) vs hw (0 ffffffffffffffff) [ 5.202551] fb0: switching to vc4drmfb from simple [ 5.203102] Console: switching to colour dummy device 80x25 [ 5.207106] vc4-drm gpu: bound fe600000.firmwarekms (ops vc4_fkms_ops [vc4]) [ 5.208076] [drm] Initialized vc4 0.0.0 20140616 for gpu on minor 1 [ 5.270423] cx25840 23-0044: cx23102 A/V decoder found @ 0x88 (cx231xx #0-0) [ 5.450126] Console: switching to colour frame buffer device 160x45 [ 5.475345] vc4-drm gpu: [drm] fb0: vc4drmfb frame buffer device [ 5.489963] bcm2835-codec bcm2835-codec: Device registered as /dev/video10 [ 5.490021] bcm2835-codec bcm2835-codec: Loaded V4L2 decode [ 5.496026] bcm2835-isp bcm2835-isp: Device node output[0] registered as /dev/video13 [ 5.497887] bcm2835-isp bcm2835-isp: Device node capture[0] registered as /dev/video14 [ 5.499317] bcm2835-codec bcm2835-codec: Device registered as /dev/video11 [ 5.499363] bcm2835-codec bcm2835-codec: Loaded V4L2 encode [ 5.505634] bcm2835-isp bcm2835-isp: Device node capture[1] registered as /dev/video15 [ 5.507153] bcm2835-isp bcm2835-isp: Device node stats[2] registered as /dev/video16 [ 5.507192] bcm2835-isp bcm2835-isp: Register output node 0 with media controller [ 5.507213] bcm2835-isp bcm2835-isp: Register capture node 1 with media controller [ 5.507231] bcm2835-isp bcm2835-isp: Register capture node 2 with media controller [ 5.507248] bcm2835-isp bcm2835-isp: Register capture node 3 with media controller [ 5.507507] bcm2835-isp bcm2835-isp: Loaded V4L2 bcm2835-isp [ 5.513482] bcm2835-codec bcm2835-codec: Device registered as /dev/video12 [ 5.513692] bcm2835-codec bcm2835-codec: Loaded V4L2 isp [ 7.330001] random: crng init done [ 7.330018] random: 7 urandom warning(s) missed due to ratelimiting [ 7.366364] cx25840 23-0044: loaded v4l-cx231xx-avcore-01.fw firmware (16382 bytes) [ 7.428035] cx231xx 1-1.1:1.1: v4l2 driver version 0.0.3 [ 7.547889] cx231xx 1-1.1:1.1: Registered video device video0 [v4l2] [ 7.548740] cx231xx 1-1.1:1.1: Registered VBI device vbi0 [ 7.548763] cx231xx 1-1.1:1.1: video EndPoint Addr 0x84, Alternate settings: 5 [ 7.548822] cx231xx 1-1.1:1.1: VBI EndPoint Addr 0x85, Alternate settings: 2 [ 7.548848] cx231xx 1-1.1:1.1: sliced CC EndPoint Addr 0x86, Alternate settings: 2 [ 7.549684] usbcore: registered new interface driver cx231xx [ 7.627384] cx231xx 1-1.1:1.1: audio EndPoint Addr 0x83, Alternate settings: 3 [ 7.627408] cx231xx 1-1.1:1.1: Cx231xx Audio Extension initialized [ 7.753942] 8021q: 802.1Q VLAN Support v1.8 [ 7.969437] cfg80211: Loading compiled-in X.509 certificates for regulatory database [ 8.014384] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7' [ 8.053599] Adding 102396k swap on /var/swap. Priority:-2 extents:1 across:102396k SSFS [ 8.167773] bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay) [ 8.168038] bcmgenet fd580000.ethernet eth0: Link is Down [ 10.205672] bcmgenet fd580000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off [ 10.205706] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 10.784946] broken atomic modeset userspace detected, disabling atomic [ 12.972884] ICMPv6: process `dhcpcd' is using deprecated sysctl (syscall) net.ipv6.neigh.eth0.retrans_time - use net.ipv6.neigh.eth0.retrans_time_ms instead [ 16.743581] sctp: Hash tables configured (bind 256/256) [ 16.927066] NET: Registered protocol family 15 [ 17.282575] NET: Registered protocol family 38 [66330.550926] bcm2835-codec bcm2835-codec: bcm2835_codec_stop_streaming: Timeout waiting for buffers to be returned - 6 outstanding [66330.551093] ------------[ cut here ]------------ [66330.551114] WARNING: CPU: 2 PID: 32419 at drivers/media/common/videobuf2/videobuf2-core.c:1956 vb2_queue_cancel+0x1b8/0x220 [videobuf2_common] [66330.551117] Modules linked in: tcp_diag inet_diag jitterentropy_rng drbg authenc echainiv cast5_generic cast_common ctr algif_skcipher xcbc md4 algif_hash af_alg xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key md5 sctp cfg80211 rfkill 8021q garp stp llc cx231xx_alsa spidev cx25840 vc4 cx231xx cx2341x tveeprom cec v3d drm_kms_helper gpu_sched bcm2835_isp(C) bcm2835_codec(C) bcm2835_v4l2(C) v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig drm videobuf2_memops videobuf2_v4l2 videobuf2_common snd_soc_core drm_panel_orientation_quirks snd_compress raspberrypi_hwmon i2c_mux_pinctrl snd_bcm2835(C) snd_pcm_dmaengine snd_pcm i2c_mux vc_sm_cma(C) videodev snd_timer mc i2c_bcm2835 snd spi_bcm2835 syscopyarea sysfillrect sysimgblt fb_sys_fops backlight rpivid_mem uio_pdrv_genirq uio nvmem_rmem nft_nat nft_chain_nat nf_nat nft_counter nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 i2c_dev aes_neon_bs nf_tables sha256_generic aes_neon_blk nfnetlink crypto_simd cryptd [66330.551301] ip_tables x_tables ipv6 [66330.551313] CPU: 2 PID: 32419 Comm: python3 Tainted: G C 5.10.60-v8+ #1449 [66330.551316] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT) [66330.551321] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--) [66330.551329] pc : vb2_queue_cancel+0x1b8/0x220 [videobuf2_common] [66330.551336] lr : vb2_queue_cancel+0x3c/0x220 [videobuf2_common] [66330.551339] sp : ffffffc01250bb10 [66330.551342] x29: ffffffc01250bb10 x28: ffffffc008d52ed0 [66330.551348] x27: ffffff80547dff00 x26: 0000000000000000 [66330.551355] x25: ffffff80465d3630 x24: 0000000040045613 [66330.551361] x23: 0000000000000000 x22: ffffffc011298948 [66330.551367] x21: ffffff8053994028 x20: ffffff8053994028 [66330.551373] x19: ffffff8053994028 x18: 0000000000000004 [66330.551379] x17: 0000000000000000 x16: 0000000000000000 [66330.551385] x15: ffffff8046c80030 x14: ffffff807b335288 [66330.551392] x13: 0000000000000000 x12: 0000000000000000 [66330.551398] x11: ffffff807b335240 x10: ffffff8046c80038 [66330.551404] x9 : ffffffc008d9e5c4 x8 : ffffff807b335268 [66330.551410] x7 : 0000000000000000 x6 : 0000000000000000 [66330.551416] x5 : ffffff80465d381c x4 : ffffff80779c5ac0 [66330.551422] x3 : ffffff8046c80008 x2 : 0000000000000002 [66330.551428] x1 : 0000000000000000 x0 : 0000000000000006 [66330.551435] Call trace: [66330.551442] vb2_queue_cancel+0x1b8/0x220 [videobuf2_common] [66330.551449] vb2_core_streamoff+0x2c/0xc8 [videobuf2_common] [66330.551457] vb2_streamoff+0x24/0x78 [videobuf2_v4l2] [66330.551466] v4l2_m2m_streamoff+0x44/0x130 [v4l2_mem2mem] [66330.551472] v4l2_m2m_ioctl_streamoff+0x20/0x30 [v4l2_mem2mem] [66330.551506] v4l_streamoff+0x2c/0x38 [videodev] [66330.551523] video_do_ioctl+0x18c/0x3f0 [videodev] [66330.551541] video_usercopy+0x154/0x618 [videodev] [66330.551558] video_ioctl2+0x20/0x68 [videodev] [66330.551575] v4l2_ioctl+0x48/0x68 [videodev] [66330.551587] arm64_sys_ioctl+0xb0/0xf0 [66330.551593] el0_svc_common.constprop.3+0xc0/0x1f8 [66330.551597] do_el0_svc+0x2c/0x98 [66330.551603] el0_svc+0x20/0x30 [66330.551607] el0_sync_handler+0x90/0xb8 [66330.551611] el0_sync+0x180/0x1c0 [66330.551615] ---[ end trace c68569376abb59e3 ]--- [66330.551622] videobuf2_common: driver bug: stop_streaming operation is leaving buf 0000000034806f25 in active state [66330.551629] videobuf2_common: driver bug: stop_streaming operation is leaving buf 000000000acef686 in active state [66330.551636] videobuf2_common: driver bug: stop_streaming operation is leaving buf 000000005f7a6875 in active state [66330.551643] videobuf2_common: driver bug: stop_streaming operation is leaving buf 00000000fa7dc918 in active state [66330.551649] videobuf2_common: driver bug: stop_streaming operation is leaving buf 000000001e0b1214 in active state [66330.551655] videobuf2_common: driver bug: stop_streaming operation is leaving buf 00000000d42ea7b1 in active state [66333.719009] bcm2835_mmal_vchiq: timed out waiting for sync completion [66334.532336] bcm2835_mmal_vchiq: received invalid message context 5! [66431.640548] bcm2835-codec bcm2835-codec: bcm2835_codec_stop_streaming: Timeout waiting for buffers to be returned - 6 outstanding

raspinfo hangs here. No further logs. Please note that the last line in the log was caused by an attempt to play the file again.

Logs dmesg: [66330.550926] bcm2835-codec bcm2835-codec: bcm2835_codec_stop_streaming: Timeout waiting for buffers to be returned - 6 outstanding [66330.551093] ------------[ cut here ]------------ [66330.551114] WARNING: CPU: 2 PID: 32419 at drivers/media/common/videobuf2/videobuf2-core.c:1956 vb2_queue_cancel+0x1b8/0x220 [videobuf2_common] [66330.551117] Modules linked in: tcp_diag inet_diag jitterentropy_rng drbg authenc echainiv cast5_generic cast_common ctr algif_skcipher xcbc md4 algif_hash af_alg xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key md5 sctp cfg80211 rfkill 8021q garp stp llc cx231xx_alsa spidev cx25840 vc4 cx231xx cx2341x tveeprom cec v3d drm_kms_helper gpu_sched bcm2835_isp(C) bcm2835_codec(C) bcm2835_v4l2(C) v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig drm videobuf2_memops videobuf2_v4l2 videobuf2_common snd_soc_core drm_panel_orientation_quirks snd_compress raspberrypi_hwmon i2c_mux_pinctrl snd_bcm2835(C) snd_pcm_dmaengine snd_pcm i2c_mux vc_sm_cma(C) videodev snd_timer mc i2c_bcm2835 snd spi_bcm2835 syscopyarea sysfillrect sysimgblt fb_sys_fops backlight rpivid_mem uio_pdrv_genirq uio nvmem_rmem nft_nat nft_chain_nat nf_nat nft_counter nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 i2c_dev aes_neon_bs nf_tables sha256_generic aes_neon_blk nfnetlink crypto_simd cryptd [66330.551301] ip_tables x_tables ipv6 [66330.551313] CPU: 2 PID: 32419 Comm: python3 Tainted: G C 5.10.60-v8+ #1449 [66330.551316] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT) [66330.551321] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--) [66330.551329] pc : vb2_queue_cancel+0x1b8/0x220 [videobuf2_common] [66330.551336] lr : vb2_queue_cancel+0x3c/0x220 [videobuf2_common] [66330.551339] sp : ffffffc01250bb10 [66330.551342] x29: ffffffc01250bb10 x28: ffffffc008d52ed0 [66330.551348] x27: ffffff80547dff00 x26: 0000000000000000 [66330.551355] x25: ffffff80465d3630 x24: 0000000040045613 [66330.551361] x23: 0000000000000000 x22: ffffffc011298948 [66330.551367] x21: ffffff8053994028 x20: ffffff8053994028 [66330.551373] x19: ffffff8053994028 x18: 0000000000000004 [66330.551379] x17: 0000000000000000 x16: 0000000000000000 [66330.551385] x15: ffffff8046c80030 x14: ffffff807b335288 [66330.551392] x13: 0000000000000000 x12: 0000000000000000 [66330.551398] x11: ffffff807b335240 x10: ffffff8046c80038 [66330.551404] x9 : ffffffc008d9e5c4 x8 : ffffff807b335268 [66330.551410] x7 : 0000000000000000 x6 : 0000000000000000 [66330.551416] x5 : ffffff80465d381c x4 : ffffff80779c5ac0 [66330.551422] x3 : ffffff8046c80008 x2 : 0000000000000002 [66330.551428] x1 : 0000000000000000 x0 : 0000000000000006 [66330.551435] Call trace: [66330.551442] vb2_queue_cancel+0x1b8/0x220 [videobuf2_common] [66330.551449] vb2_core_streamoff+0x2c/0xc8 [videobuf2_common] [66330.551457] vb2_streamoff+0x24/0x78 [videobuf2_v4l2] [66330.551466] v4l2_m2m_streamoff+0x44/0x130 [v4l2_mem2mem] [66330.551472] v4l2_m2m_ioctl_streamoff+0x20/0x30 [v4l2_mem2mem] [66330.551506] v4l_streamoff+0x2c/0x38 [videodev] [66330.551523] video_do_ioctl+0x18c/0x3f0 [videodev] [66330.551541] video_usercopy+0x154/0x618 [videodev] [66330.551558] video_ioctl2+0x20/0x68 [videodev] [66330.551575] v4l2_ioctl+0x48/0x68 [videodev] [66330.551587] arm64_sys_ioctl+0xb0/0xf0 [66330.551593] el0_svc_common.constprop.3+0xc0/0x1f8 [66330.551597] do_el0_svc+0x2c/0x98 [66330.551603] el0_svc+0x20/0x30 [66330.551607] el0_sync_handler+0x90/0xb8 [66330.551611] el0_sync+0x180/0x1c0 [66330.551615] ---[ end trace c68569376abb59e3 ]--- [66330.551622] videobuf2_common: driver bug: stop_streaming operation is leaving buf 0000000034806f25 in active state [66330.551629] videobuf2_common: driver bug: stop_streaming operation is leaving buf 000000000acef686 in active state [66330.551636] videobuf2_common: driver bug: stop_streaming operation is leaving buf 000000005f7a6875 in active state [66330.551643] videobuf2_common: driver bug: stop_streaming operation is leaving buf 00000000fa7dc918 in active state [66330.551649] videobuf2_common: driver bug: stop_streaming operation is leaving buf 000000001e0b1214 in active state [66330.551655] videobuf2_common: driver bug: stop_streaming operation is leaving buf 00000000d42ea7b1 in active state [66333.719009] bcm2835_mmal_vchiq: timed out waiting for sync completion [66334.532336] bcm2835_mmal_vchiq: received invalid message context 5!

Additional context I have over 100 Raspberry Pi 4B devices with 4 GB, but different revisions (1.1, 1.2, 1.4). Very few of them exhibit this behaviour, but those which exhibit the behaviour do so reproducibly after 20-80 minutes of playing video. The default GStreamer 1.14 didn't work with v4l2h264dec, so I compiled GStreamer 1.19 from source.

6by9 commented 2 years ago

Please try adding start_debug=1 to /boot/config.txt, reboot, and retest. When it locks up again, run sudo vcdbg log assert and report any asserts that have fired.

There are a couple of fixes to the V4L2 codec on https://github.com/6by9/linux/commits/rpi-5.10.y-codecs that are being tested out at present. There is a firmware fix too, but that should be less critical in this situation.

hailfinger commented 2 years ago

sudo: vcdbg: command not found This is the 64-bit build of Raspberry Pi OS. I haven't been able to find vcdbg. Any pointers would be highly appreciated.

pelwell commented 2 years ago

There's a static 32-bit build here: https://drive.google.com/file/d/1HS9E5vnxxNqrizB4mEYrnFoQQ1axSRKm/view?usp=sharing

hailfinger commented 2 years ago
root@pi-dortmund-eving:~# ./vcdbg log assert
3900072.982: assert( 0xa00 == ( VCD_REG_RD(INST, DEC_STATUS ) & 0xa00 ) ) failed; ../../../../../codecs/video/hw/dec3/vd3/vd3_mbloop_fns.c::vd3_mbloop_wait_for_backend line 136 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3900175.361: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/vd3/vd3_mbloop.c::vd3_mbloop_end_decode_picture line 1165 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3900175.432: assert( 0 ) failed; ../../../../../vcfw/drivers/chip/vciv/2708/h264pwr.c::h264pwr_disable_lowlevel line 530 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3900175.504: assert( 0 ) failed; ../../../../../vcfw/drivers/chip/vciv/2708/h264pwr.c::h264pwr_disable_lowlevel line 555 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3900182.028: assert( loopCount < MAX_HW_IDLE_LOOP_COUNT ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::H264_WaitForIdleHW line 741 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3900284.412: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::ResetBackendHW line 129 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3900867.005: assert( loopCount < MAX_HW_IDLE_LOOP_COUNT ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::H264_WaitForIdleHW line 741 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3900969.393: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::ResetBackendHW line 129 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3901693.871: assert( loopCount < MAX_HW_IDLE_LOOP_COUNT ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::H264_WaitForIdleHW line 741 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3901796.728: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::ResetBackendHW line 129 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3902938.244: assert( loopCount < MAX_HW_IDLE_LOOP_COUNT ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::H264_WaitForIdleHW line 741 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
3903041.080: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::ResetBackendHW line 129 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------

The asserts above are present when no dmesg warnings have appeared yet, but the stream already hangs.

hailfinger commented 2 years ago

After stopping the hanging stream, the dmesg warning appears and a different set of asserts is triggered.

[ 8401.797012] ------------[ cut here ]------------
[ 8401.797033] WARNING: CPU: 2 PID: 640 at drivers/media/common/videobuf2/videobuf2-core.c:1956 __vb2_queue_cancel+0x1b8/0x220 [videobuf2_common]
[ 8401.797036] Modules linked in: tcp_diag inet_diag jitterentropy_rng drbg authenc echainiv cast5_generic cast_common ctr algif_skcipher xcbc md4 algif_hash af_alg xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key md5 sctp cx231xx_alsa cfg80211 rfkill 8021q garp stp llc cx25840 spidev cx231xx cx2341x tveeprom v3d vc4 cec raspberrypi_hwmon gpu_sched drm_kms_helper i2c_mux_pinctrl i2c_bcm2835 spi_bcm2835 i2c_mux nft_nat drm drm_panel_orientation_quirks bcm2835_codec(C) bcm2835_v4l2(C) bcm2835_isp(C) v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops snd_soc_core videobuf2_v4l2 snd_compress videobuf2_common snd_bcm2835(C) vc_sm_cma(C) snd_pcm_dmaengine videodev mc syscopyarea snd_pcm sysfillrect sysimgblt fb_sys_fops rpivid_mem snd_timer backlight snd uio_pdrv_genirq uio nvmem_rmem nft_chain_nat nf_nat nft_counter nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables i2c_dev aes_neon_bs nfnetlink sha256_generic aes_neon_blk crypto_simd cryptd
[ 8401.797220]  ip_tables x_tables ipv6
[ 8401.797232] CPU: 2 PID: 640 Comm: python3 Tainted: G         C        5.10.60-v8+ #1449
[ 8401.797235] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 8401.797240] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
[ 8401.797249] pc : __vb2_queue_cancel+0x1b8/0x220 [videobuf2_common]
[ 8401.797256] lr : __vb2_queue_cancel+0x3c/0x220 [videobuf2_common]
[ 8401.797259] sp : ffffffc012423b10
[ 8401.797262] x29: ffffffc012423b10 x28: ffffffc008d63ed0 
[ 8401.797268] x27: ffffff8042cbca00 x26: 0000000000000000 
[ 8401.797275] x25: ffffff8044fbee30 x24: 0000000040045613 
[ 8401.797281] x23: 0000000000000000 x22: ffffffc011298948 
[ 8401.797287] x21: ffffff805d49c028 x20: ffffff805d49c028 
[ 8401.797293] x19: ffffff805d49c028 x18: 0000000000000004 
[ 8401.797299] x17: 0000000000000000 x16: 0000000000000000 
[ 8401.797305] x15: ffffff8046140030 x14: ffffff806fc62280 
[ 8401.797311] x13: 0000000000000000 x12: 0000000000000000 
[ 8401.797317] x11: ffffff806fc62238 x10: ffffff8046140038 
[ 8401.797323] x9 : ffffffc008e0d5c4 x8 : ffffff806fc62260 
[ 8401.797330] x7 : 0000000000000000 x6 : 0000000000000000 
[ 8401.797336] x5 : ffffff8044fbf01c x4 : ffffff8044fc5ac0 
[ 8401.797342] x3 : ffffff8046140008 x2 : 0000000000000002 
[ 8401.797348] x1 : 0000000000000000 x0 : 0000000000000006 
[ 8401.797354] Call trace:
[ 8401.797362]  __vb2_queue_cancel+0x1b8/0x220 [videobuf2_common]
[ 8401.797369]  vb2_core_streamoff+0x2c/0xc8 [videobuf2_common]
[ 8401.797377]  vb2_streamoff+0x24/0x78 [videobuf2_v4l2]
[ 8401.797386]  v4l2_m2m_streamoff+0x44/0x130 [v4l2_mem2mem]
[ 8401.797393]  v4l2_m2m_ioctl_streamoff+0x20/0x30 [v4l2_mem2mem]
[ 8401.797429]  v4l_streamoff+0x2c/0x38 [videodev]
[ 8401.797447]  __video_do_ioctl+0x18c/0x3f0 [videodev]
[ 8401.797464]  video_usercopy+0x154/0x618 [videodev]
[ 8401.797481]  video_ioctl2+0x20/0x68 [videodev]
[ 8401.797498]  v4l2_ioctl+0x48/0x68 [videodev]
[ 8401.797509]  __arm64_sys_ioctl+0xb0/0xf0
[ 8401.797515]  el0_svc_common.constprop.3+0xc0/0x1f8
[ 8401.797519]  do_el0_svc+0x2c/0x98
[ 8401.797525]  el0_svc+0x20/0x30
[ 8401.797530]  el0_sync_handler+0x90/0xb8
[ 8401.797533]  el0_sync+0x180/0x1c0
[ 8401.797537] ---[ end trace 0d8860b28c4b7ef7 ]---
[ 8401.797543] videobuf2_common: driver bug: stop_streaming operation is leaving buf 000000007f56f3bd in active state
[ 8401.797552] videobuf2_common: driver bug: stop_streaming operation is leaving buf 00000000638add57 in active state
[ 8401.797558] videobuf2_common: driver bug: stop_streaming operation is leaving buf 000000004ecc4c66 in active state
[ 8401.797565] videobuf2_common: driver bug: stop_streaming operation is leaving buf 00000000ef361fbc in active state
[ 8401.797571] videobuf2_common: driver bug: stop_streaming operation is leaving buf 0000000089bbe43c in active state
[ 8401.797578] videobuf2_common: driver bug: stop_streaming operation is leaving buf 0000000019ac0f87 in active state
root@pi-dortmund-eving:~# ./vcdbg log assert
4120719.493: assert( loopCount < MAX_HW_IDLE_LOOP_COUNT ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::H264_WaitForIdleHW line 741 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4120821.860: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::ResetBackendHW line 129 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4121340.356: assert( loopCount < MAX_HW_IDLE_LOOP_COUNT ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::H264_WaitForIdleHW line 741 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4121442.939: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::ResetBackendHW line 129 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4122018.090: assert( loopCount < MAX_HW_IDLE_LOOP_COUNT ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::H264_WaitForIdleHW line 741 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4122120.449: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::ResetBackendHW line 129 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4122632.743: assert( (status & 0x3f0000) == 0 ) failed; ../../../../../codecs/video/hw/dec3/vd3/vd3_mbloop_fns.c::vd3_mbloop_wait_for_backend line 107 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4122632.826: assert( 0xa00 == ( VCD_REG_RD(INST, DEC_STATUS ) & 0xa00 ) ) failed; ../../../../../codecs/video/hw/dec3/vd3/vd3_mbloop_fns.c::vd3_mbloop_wait_for_backend line 136 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4122641.498: assert( Timed out waiting to destroy ril.video_decode ) failed; ../../../../../interface/mmal/vc/ril/mmal_ril.c::fail_destroy line 1471 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4122735.191: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/vd3/vd3_mbloop.c::vd3_mbloop_end_decode_picture line 1165 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4122735.250: assert( 0 ) failed; ../../../../../vcfw/drivers/chip/vciv/2708/h264pwr.c::h264pwr_disable_lowlevel line 530 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
4122735.319: assert( 0 ) failed; ../../../../../vcfw/drivers/chip/vciv/2708/h264pwr.c::h264pwr_disable_lowlevel line 555 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
hailfinger commented 2 years ago

I ran vcdbg in a loop every 5 seconds to log all messages because the message buffer was too small. The first debug messages caught were:

005746.826: assert( timeout != 0 ) failed; ../../../../../vcfw/drivers/chip/vciv/2708/clock_2711.c::clock_wait_busy line 2125 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
025470.979: assert( eptr ) failed; ../../../../../middleware/dispmanx/dispmanx.c::dispmanx_execute_update line 968 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
1690142.600: assert( consumed >= 0 ) failed; ../../../../../applications/vmcs/audioserv/audio_server.c::audio_complete_task line 322 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
1695102.580: assert( consumed >= 0 ) failed; ../../../../../applications/vmcs/audioserv/audio_server.c::audio_complete_task line 322 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
1824225.369: assert( loopCount < MAX_HW_IDLE_LOOP_COUNT ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::H264_WaitForIdleHW line 741 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
1824327.767: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::ResetBackendHW line 129 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
1824971.154: assert( loopCount < MAX_HW_IDLE_LOOP_COUNT ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::H264_WaitForIdleHW line 741 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
1825073.531: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::ResetBackendHW line 129 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
1825652.844: assert( loopCount < MAX_HW_IDLE_LOOP_COUNT ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::H264_WaitForIdleHW line 741 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------
1825755.230: assert( vd3_mbloop_end_decode_picture DEC_MAINCTL failed to go idle ) failed; ../../../../../codecs/video/hw/dec3/h264/h264_inner_slice.c::ResetBackendHW line 129 rev 67615e9
vcdbg_ctx_get_dump_stack: dump_stack failed
----------------

In that case I was also outputting audio via ALSA, but the same problem happens regardless of audio usage. If you need more debug logs, please tell me, I can easily reproduce the issue.

6by9 commented 2 years ago

Ouch, something has really upset the H264 hardware block.

If my memory serves correctly, error recovery is a little lacking there and results in the block not being released, so no further decoding is possible. I may be able to fix that part, but I don't have the knowledge of the internals of the codec block to determine why it gets so upset in the first place. If it is only a few devices that lock up like this then it's going to be tricky to reproduce. It may be down to slight variations in the silicon, and adding over_voltage=2 to config.txt gives it enough volts to cope. You say this video is a loop (I'll try it later). Is it always at the same point in the video that this fails?

hailfinger commented 2 years ago

The hardware is >100 km away and I have to ask someone to take a look where it hangs, but IIRC the hang happens usually after more than one loop. Total length of the video is ~27 minutes. You can also get it as a non-looping complete playlist here: https://out1.nac-cdn.net/testvideo/720p25/teststream_720mid/index.m3u8

I will retry with over_voltage=2 and report back.

hailfinger commented 2 years ago

3 hours stable so far with over_voltage=2. That's twice the length of the previous record. Unless you're instructing me otherwise, I'll keep testing until either the DSL line disconnects or 24 hours have elapsed.

hailfinger commented 2 years ago

9 hours continuous video playback and counting with over_voltage=2 and no H.264 asserts in the vcdbg logs. The only remaining asserts are audio_complete_task. Sounds like this H264 decoder lockup is really due to variations in the silicon as you suspected.

hailfinger commented 2 years ago

25 hours continuous video playback stable with over_voltage=2. @6by9 is there anything else I can test, specifically kernel/firmware fixes? If it helps, I can probably arrange to get that device shipped to you. That device is the most unstable one found so far, but I will run a burn-in test on all devices I have access to.

6by9 commented 2 years ago

All devices go through a calibration process in the factory to determine the voltage levels at which they are stable. I'm not involved in developing those tests. @timg236 may be able to advise what is covered in that test, and whether there is gain in getting that device back to try and improve them.

over_voltage=2 just tickles the voltage up by 2 steps of 0.025V above the configured value. It'll burn a smidge more power, and therefore run a touch hotter, but is largely harmless on "good" silicon.

pelwell commented 2 years ago

The onboard voltage measurement system (using ring oscillators) can observe the voltage at a number of points on the chip, but the SoC manufacturing process causes small variations in the voltage requirements of each part of the silicon. When the boards are programmed on the production line the voltage is checked at a number of points while running a stress test. Boards that fail that test are marked as requiring a higher voltage and retested. At every boot the required voltage is reassessed to take account of temperature and other variations.

A margin is built into the test to take account of most variations across the die, but it is possible (though rare, otherwise we'd have lots of returns) that a board could pass that test but still fall short in some part of the chip under some loads. It looks as though you have found a combination of load and board that just falls short, and the over_voltage setting is the correct workaround.

It would be interesting to see if the board is one of those that already needs slightly more voltage - try temporarily removing the over_voltage setting, rebooting and allowing it to fall idle, then sudo vcgencmd measure_volts core. (You can try that again after restoring the over_voltage and you should see the difference).

hailfinger commented 2 years ago

On the machine exhibiting the problem (pi-dortmund-eving), I get these results:

# vcgencmd measure_volts core
volt=0.8563V

On the currently active machines in my fleet, core voltages are distributed as follows: 9 volt=0.8375V 1 volt=0.8400V 27 volt=0.8438V 25 volt=0.8500V 22 volt=0.8563V 3 volt=0.8600V 12 volt=0.8625V 4 volt=0.8688V 2 volt=0.8700V

hailfinger commented 2 years ago

A possibly interesting additional bit of info: With over_voltage=2 enabled on this device, I still get some asserts in vcdbg (and those asserts haven't been seen on any other Pi in my fleet):

    005835.110: assert( timeout != 0 ) failed; ../../../../../vcfw/drivers/chip/vciv/2708/clock_2711.c::clock_wait_busy line 2125 rev 67615e9
    vcdbg_ctx_get_dump_stack: dump_stack failed
    ----------------
    025351.513: assert( eptr ) failed; ../../../../../middleware/dispmanx/dispmanx.c::dispmanx_execute_update line 968 rev 67615e9
    vcdbg_ctx_get_dump_stack: dump_stack failed
    ----------------

Are those messages helpful? Should I open a separate ticket for them?

pelwell commented 2 years ago

volt=0.8563V

That's one of the higher voltages in the range, but in no way extreme. It does just seem to be one of those parts for which the usual calibration process doesn't quite return a high enough voltage.

005835.110: assert( timeout != 0 ) failed; ../../../../../vcfw/drivers/chip/vciv/2708/clock_2711.c::clock_wait_busy line 2125 rev 67615e9

This is warning that an on-chip clock failed to settle after some change within a sensible time - in this case 100us. That's a long time for a clock that's probably running at many megahertz. It's not a warning I've seen before.

025351.513: assert( eptr ) failed; ../../../../../middleware/dispmanx/dispmanx.c::dispmanx_execute_update line 968 rev 67615e9

This one's more tricky for someone not that familiar with dispmanx (me), but I think it's warning that a display update to remove an element from the display was referencing an element that wasn't actually on the display.

If these assertions are only appearing on one device out of a reasonable-sized group, all running the same software and workload, then it's going to be hard to reproduce what might turn out to be some subtle hardware problem. Keep an eye on the other units, and if these issues crop up elsewhere then do open a new issue.

6by9 commented 2 years ago

025351.513: assert( eptr ) failed; ../../../../../middleware/dispmanx/dispmanx.c::dispmanx_execute_update line 968 rev 67615e9

Yes it's generally on HDMI mode switch as DispmanX removes all elements from the screen. However the client doesn't necessarily know about the mode switch, so when it goes to remove an element it thinks is on-screen, DispmanX asserts that it isn't. It was not uncommon earlier in FKMS land, but I thought popcornmix had removed it - obviously not.

hailfinger commented 2 years ago

Uh oh. After some stress tests, I have a few more devices (roughly 10% of my fleet) exhibiting similar errors in dmesg, but the output from vcdbg differs significantly (no asserts). As suggested by pelwell, I'll open a new issue for that.

popcornmix commented 2 years ago

@hailfinger can you report output of vcgencmd version?

popcornmix commented 2 years ago

The dispmanx assert is harmless, and will occur with fkms driver when changing hdmi mode. It has been removed now.

hailfinger commented 2 years ago

@hailfinger can you report output of vcgencmd version?

For this Pi (pi-dortmund-eving)? Aug 31 2021 14:53:42 Copyright (c) 2012 Broadcom version 67615e950e1e28b92dfae6303cf7a8b879a8908f (clean) (release) (start_x)

This Pi has been taken out of production and has been replaced with more stable Pi, I'll be able to test this Pi with various experimental firmwares and kernels in the next few days.

hailfinger commented 2 years ago

@6by9 @pelwell I have some devices where over_voltage=2 is not sufficient to eliminate this error, but it seems like I need more than 2 hours to trigger it now. How far can I go with over_voltage without damaging the Pi?

With over_voltage=2:

root@pi-berlin-schmargendorf:~# vcgencmd measure_volts core volt=0.9125V

root@pi-schwedt:~# vcgencmd measure_volts core volt=0.8938V

pelwell commented 2 years ago

Sorry - I intended to reply sooner. over_voltage settings up to 7-8 ought to be fine - the firmware clamps the core voltage range between 835mV and 1100mv, and an over-voltage of 8*25mV gives an actual core voltage of somewhere just over 1050. However, the lower you can set it the better, since increasing voltage increases temperature and power consumption, and may reduce the operating life.

hailfinger commented 2 years ago

Thanks! Will try with ever-increasing over_voltage until all Pi 4B are stable. After intense testing, I can trigger the warning on a handful of devices with over_voltage=2 and on 60+ devices without over_voltage.

hailfinger commented 2 years ago

After some more testing, here are the results for streams of up to 5 hours, percentages approximate, sample size = 105 devices. 10% do not need over_voltage 50% need over_voltage=2 30% need over_voltage=4 10% need over_voltage=6 one device has been somewhat stable with over_voltage=6, but the "recycling bust" messages in vcdbg log msg still appear sometimes.