raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.05k stars 4.96k forks source link

Kernel oops or hard freeze when streaming video on Zero W (and Pi 3B+) #2555

Open balboah opened 6 years ago

balboah commented 6 years ago

This is a summary from troubleshooting on https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=213423&p=1315106

When trying to stream video on Zero W with the noir v2 camera module, I get many variants of kernel oopses and freezes. For example:

[  273.802531] random: crng init done
[  379.898672] ------------[ cut here ]------------
[  379.901131] kernel BUG at Returning to usermode but unexpected PSR bits set?:5!
[  379.903359] Internal error: Oops - BUG: 0 [#1] ARM
[  379.905866] Modules linked in: cmac bnep hci_uart btbcm serdev bluetooth ecdh_generic brcmfmac brcmutil cfg80211 snd_bcm2835(C) snd_pcm rfkill snd_timer snd uio_pdrv_genirq uio fixed ip_tables x_tables ipv6
[  379.913291] CPU: 0 PID: 826 Comm: modprobe Tainted: G         C      4.14.34+ #1110
[  379.917705] Hardware name: BCM2835
[  379.920005] task: d6a0e120 task.stack: d5662000
[  379.922689] PC is at no_work_pending+0x30/0x34
[  379.925152] LR is at 0xbea4f970
[  379.927229] pc : [<c000fe54>]    lr : [<bea4f970>]    psr: 20000013
[  379.929588] sp : d5663fa8  ip : d5663fa8  fp : 00000000
[  379.932763] r10: 00000000  r9 : d5662000  r8 : c000ff64
[  379.935421] r7 : 000000d9  r6 : 00d321d8  r5 : 00000020  r4 : 00000078
[  379.938231] r3 : c0939414  r2 : d5663fe4  r1 : b6e6f198  r0 : 00000000
[  379.940929] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  379.943654] Control: 00c5387d  Table: 15664008  DAC: 00000055
[  379.946464] Process modprobe (pid: 826, stack limit = 0xd5662188)
[  379.948699] Stack: (0xd5663fa8 to 0xd5664000)
[  379.951177] 3fa0:                   00d321b8 00000020 00000000 00d321d8 00008000 00000000
[  379.956042] 3fc0: 00d321b8 00000020 00d321d8 000000d9 00000002 0002b990 0003f030 bea4fb10
[  379.960686] 3fe0: 0003efa8 bea4f970 b6e6f198 b6e6f0b8 60000010 00000000 00000000 00000000
[  379.966304] Code: e9527fff e1a00000 e28dd048 e1b0f00e (e7f001f2)
[  379.969043] ---[ end trace ee6907230b405e54 ]---
[  185.473451] random: crng init done
[  289.843386] Unable to handle kernel paging request at virtual address a259878c
[  289.845896] pgd = d3d98000
[  289.847720] [a259878c] *pgd=00000000
[  289.849367] Internal error: Oops: 5 [#1] ARM
[  289.851559] Modules linked in: cmac bnep hci_uart btbcm serdev bluetooth ecdh_generic brcmfmac brcmutil cfg80211 snd_bcm2835(C) rfkill snd_pcm snd_timer snd uio_pdrv_genirq fixed uio ip_tables x_tables ipv6
[  289.858880] CPU: 0 PID: 566 Comm: VCHIQ completio Tainted: G         C      4.14.34+ #1110
[  289.864663] Hardware name: BCM2835
[  289.867276] task: d3d24560 task.stack: d3d94000
[  289.869315] PC is at alloc_contig_range+0x254/0x344
[  289.871577] LR is at drain_all_pages+0x9c/0x148
[  289.873676] pc : [<c01005f8>]    lr : [<c00fcddc>]    psr: 60000013
[  289.876103] sp : d3d95b80  ip : 00000000  fp : d3d95c14
[  289.878873] r10: 00000005  r9 : 00017521  r8 : d3d95ba0
[  289.881112] r7 : c0108314  r6 : c0646614  r5 : d3d95b58  r4 : d3d95b6c
[  289.883087] r3 : a2598780  r2 : 00000000  r1 : d7c8a500  r0 : 00000000
[  289.885885] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  289.888657] Control: 00c5387d  Table: 13d98008  DAC: 00000055
[  289.891100] Process VCHIQ completio (pid: 566, stack limit = 0xd3d94188)
[  289.894036] Stack: (0xd3d95b80 to 0xd3d96000)
[  289.897008] 5b80: 00000002 00000006 fffffff0 00017800 00017400 00000004 00000000 00000000
[  289.902505] 5ba0: d3d95ba0 d3d95ba0 c09be368 00000000 00000001 00000001 00000000 00000000
[  289.907371] 5bc0: 00000000 00017520 014080c0 ffffffff 00000000 00000000 00000000 00000002
[  289.913045] 5be0: 00000001 00000000 00000001 00017520 00000120 c0a46a88 fffffff0 00000001
[  289.918027] 5c00: 00000001 00000800 d3d95c64 d3d95c18 c0155648 c01003b0 00000000 00000000
[  289.924077] 5c20: 00000000 00000001 00000000 014080c0 00000000 c0a46a98 00000000 00000001
[  289.931001] 5c40: d3d95ce8 00000001 00000247 00001000 00000000 d3d95d94 d3d95c74 d3d95c68
[  289.936747] 5c60: c04078d0 c0155554 d3d95c9c d3d95c78 c0019d84 c0407898 d3d95ce8 c0518d68
[  289.943030] 5c80: d7162e10 00000001 00000000 d53ab780 d3d95ccc d3d95ca0 c0019e5c c0019d50
[  289.949379] 5ca0: c0518d68 00000001 00000000 014080c0 ffffffff 014080c0 d7162e10 ffffffff
[  289.956599] 5cc0: d3d95d34 d3d95cd0 c001a01c c0019e18 d3d95d04 d3d95ce0 c004c940 c00409d4
[  289.964323] 5ce0: d515fd40 00000000 00000000 d7162e10 00001000 014080c0 00000247 c0518d68
[  289.970769] 5d00: d3d24501 00000000 00000000 00000000 00000000 00000bb0 0000e2f8 00000198
[  289.977483] 5d20: d7162e10 00000006 d3d95d5c d3d95d38 c001a248 c0019e70 00000247 00000000
[  289.983707] 5d40: 00000000 c0518d68 0000000f c001a1fc d3d95dd4 d3d95d60 c0518d68 c001a208
[  289.991248] 5d60: 00000000 00000200 00000200 c0517a80 00000000 d6b44f10 d3d24560 d3d95d88
[  289.998981] 5d80: 0044bbb0 00000001 00000001 0000e2f8 d3d95dd4 ffffffff c0517a8c c0026478
[  290.005268] 5da0: c0648df0 c0645a34 c0944318 d6b44e00 00000000 0044bbb0 00000000 0000e2f8
[  290.012067] 5dc0: 00000072 00000006 d3d95e44 d3d95dd8 c05110ac c0518c84 00000001 c064767c
[  290.018587] 5de0: d3d95e04 d3d95df0 c064767c 00000020 d6b44efc 00000000 d6b44f10 d6b44f88
[  290.025889] 5e00: c0a54928 d6b44edc d3d95e2c 20000113 20000113 c09b3f50 b6cb6ce4 c014c406
[  290.033451] 5e20: c09b3f50 b6cb6ce4 00000001 d6b44e00 d7580194 d50fd000 d3d95f0c d3d95e48
[  290.039559] 5e40: c0515db8 c0510de0 b63085c0 00000000 00000001 d3d95e60 c0048748 c00484c0
[  290.046525] 5e60: d7122478 d50fd824 b6cb6d84 c014c407 d7580194 00000040 0000012c d6b44e00
[  290.053050] 5e80: c0065900 00000007 ffffffff d3d95e98 b63085c0 00000000 d3d95ec4 d695a268
[  290.060107] 5ea0: 0000b009 0044bbb0 0000e2f8 b63085c0 00000000 d3d95ec0 c004c360 c0638a0c
[  290.069844] 5ec0: c004c95c c00409d4 00000000 d5215340 00000001 d3d24560 d3d95f1c d3d95ee8
[  290.076552] 5ee0: c0046aec b6cb6ce4 d730bd88 d522ca00 00000004 00000004 d3d94000 00000000
[  290.082991] 5f00: d3d95f7c d3d95f10 c016c858 c0515870 d3d95f74 d3d95f20 c06455ac c0046a9c
[  290.090363] 5f20: 00000004 c01770c4 80000013 ffffffff 7a12d780 c01774c4 d5231a80 b6e79538
[  290.098061] 5f40: b6cb6ce4 0044bbb0 c014c406 00000004 d3d95f6c d522ca01 b6cb6ce4 d522ca00
[  290.104406] 5f60: c014c406 00000004 d3d94000 00000000 d3d95fa4 d3d95f80 c016cf54 c016c7c8
[  290.111185] 5f80: 0000b009 b6e79538 b6cb6ce4 0044bbb0 00000036 c000ff64 00000000 d3d95fa8
[  290.117580] 5fa0: c000fdc0 c016cf1c b6e79538 b6cb6ce4 00000004 c014c406 b6cb6ce4 0000b009
[  290.125019] 5fc0: b6e79538 b6cb6ce4 0044bbb0 00000036 b63085c0 b6cb6d84 b6e792b4 b6e68f3c
[  290.132671] 5fe0: b6e79240 b6cb6cd4 b6e66f44 b6d8180c 80000010 00000004 00000000 00000000
[  290.138902] [<c01005f8>] (alloc_contig_range) from [<c0155648>] (cma_alloc+0x100/0x24c)
[  290.145834] [<c0155648>] (cma_alloc) from [<c04078d0>] (dma_alloc_from_contiguous+0x44/0x4c)
[  290.152461] [<c04078d0>] (dma_alloc_from_contiguous) from [<c0019d84>] (__alloc_from_contiguous+0x40/0xc8)
[  290.160121] [<c0019d84>] (__alloc_from_contiguous) from [<c0019e5c>] (cma_allocator_alloc+0x50/0x58)
[  290.167665] [<c0019e5c>] (cma_allocator_alloc) from [<c001a01c>] (__dma_alloc+0x1b8/0x350)
[  290.173780] [<c001a01c>] (__dma_alloc) from [<c001a248>] (arm_dma_alloc+0x4c/0x58)
[  290.180857] [<c001a248>] (arm_dma_alloc) from [<c0518d68>] (vchiq_prepare_bulk_data+0xf0/0x6bc)
[  290.187955] [<c0518d68>] (vchiq_prepare_bulk_data) from [<c05110ac>] (vchiq_bulk_transfer+0x2d8/0x554)
[  290.195634] [<c05110ac>] (vchiq_bulk_transfer) from [<c0515db8>] (vchiq_ioctl+0x554/0x1958)
[  290.202681] [<c0515db8>] (vchiq_ioctl) from [<c016c858>] (do_vfs_ioctl+0x9c/0x754)
[  290.209238] [<c016c858>] (do_vfs_ioctl) from [<c016cf54>] (SyS_ioctl+0x44/0x6c)
[  290.213236] [<c016cf54>] (SyS_ioctl) from [<c000fdc0>] (ret_fast_syscall+0x0/0x28)
[  290.219884] Code: e5931000 e0603008 e0833183 e0813103 (e593300c)
[  290.223715] ---[ end trace ee6907230b405e54 ]---

It can easily be reproduced by these steps:

  1. Reset to the 2018-04-18-raspbian-stretch-lite.img and enable the headless SSH and /boot/wpa_supplicant.conf
  2. raspi-config -> enable camera module -> reboot
  3. (terminal 1)$ raspivid -t 0 -w 1280 -h 720 -o /dev/null
  4. (terminal 2)$ ssh pi@the-zero-ip
  5. Repeat step 4 until crash (will happen within 5 minutes)

The issue still exists when testing after updating with rpi-update.

I have also had freezes on Pi 3B+ with the same camera, but have not been able to create a proper reproduce or catched any kernel messages. It might be more related to https://github.com/raspberrypi/linux/issues/2387

lategoodbye commented 6 years ago

Could you please try http://downloads.raspberrypi.org/raspbian_lite/images/raspbian_lite-2018-03-14/ with the Zero W?

balboah commented 6 years ago

@lategoodbye

When leaving a watch ssh pi@192.168.1.201 date in my terminal after the pi had been running raspivid -t 0 -w 1280 -h 720 -o /dev/null for maybe 15 minutes, I got:

dmesg:

[ 1473.345882] systemd: 33 output lines suppressed due to ratelimiting
[ 1551.070104] systemd-journald[108]: Failed to send WATCHDOG=1 notification message: Connection refused

journald:

sshd[857]: Accepted publickey for pi from 192.168.1.14 port 54892 ssh2: RSA SHA256:xxx
sshd[857]: pam_unix(sshd:session): session opened for user pi by (uid=0)
systemd[1]: [35B blob data]
systemd[1]: :0, function (null)(). Aborting.
systemd[1]: Caught <ABRT>, dumped core as pid 861.
kernel: systemd: 33 output lines suppressed due to ratelimiting
systemd[1]: Freezing execution.
systemd-logind[251]: Failed to start session scope session-c19.scope: Message recipient disconnected from message bus without replying
sshd[857]: pam_systemd(sshd:session): Failed to create session: Connection timed out
sshd[867]: Received disconnect from 192.168.1.14 port 54892:11: disconnected by user

But it seems I'm unable to provoke a kernel oops! I guess it's just something fishy with authenticating new logins too fast in this older version.

UPDATE

It seemed harder to trigger but after making and disconnecting ssh connections with a few seconds in between while running raspivid I got a crash on raspbian_lite-2018-03-14 as well:

[  +0.002281] kernel BUG at kernel/smpboot.c:136!
[  +0.001892] Internal error: Oops - BUG: 0 [#1] ARM
[  +0.001585] Modules linked in: cmac bnep hci_uart btbcm bluetooth brcmfmac brcmutil snd_bcm2835 cfg80211 snd_pcm snd_timer rfkill snd bcm2835_gpiomem uio_pdrv_genirq uio fixed ip_tables x_tables ipv6
[  +0.006210] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.9.80+ #1098
[  +0.002335] Hardware name: BCM2835
[  +0.001928] task: d70a9b40 task.stack: d7100000
[  +0.002436] PC is at smpboot_thread_fn+0x11c/0x178
[  +0.002669] LR is at run_ksoftirqd+0x3c/0x58
[  +0.002489] pc : [<c0042be0>]    lr : [<c0025fa0>]    psr: 60000013
              sp : d7101f40  ip : d7101f30  fp : d7101f64
[  +0.004868] r10: 00000000  r9 : 00000002  r8 : c08c3efc
[  +0.001919] r7 : 00000001  r6 : 00000000  r5 : d7100000  r4 : d70049c0
[  +0.001901] r3 : d7100000  r2 : 00000000  r1 : d70a9b40  r0 : d7100001
[  +0.002381] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  +0.002522] Control: 00c5387d  Table: 1509c008  DAC: 00000055
[  +0.002044] Process ksoftirqd/0 (pid: 3, stack limit = 0xd7100188)
[  +0.001944] Stack: (0xd7101f40 to 0xd7102000)
[  +0.002947] 1f40: 00000000 d70049e0 d70049c0 c0042ac4 00000000 00000000 d7101fac d7101f68
[  +0.005194] 1f60: c003ea54 c0042ad0 00000000 00000001 00000000 d70049c0 00000000 d7101f7c
[  +0.006899] 1f80: d7101f7c 00000000 d7101f88 d7101f88 d70049e0 c003e95c 00000000 00000000
[  +0.005113] 1fa0: 00000000 d7101fb0 c000fed4 c003e968 00000000 00000000 00000000 00000000
[  +0.004385] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  +0.005620] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[  +0.005546] [<c0042be0>] (smpboot_thread_fn) from [<c003ea54>] (kthread+0xf8/0x114)
[  +0.006337] [<c003ea54>] (kthread) from [<c000fed4>] (ret_from_fork+0x14/0x20)
[  +0.003535] Code: e5983010 e5940000 e12fff33 eaffffc4 (e7f001f2)
[  +0.103600] NOHZ: local_softirq_pending 242
[  +0.004190] NOHZ: local_softirq_pending 242
[  +0.011112] ---[ end trace f7cd89d2fa4b04cf ]---
[  +0.006461] NOHZ: local_softirq_pending 242
[  +1.040051] NOHZ: local_softirq_pending 40
[  +2.079984] NOHZ: local_softirq_pending 40
[  +1.039995] NOHZ: local_softirq_pending 40
[  +0.018528] NOHZ: local_softirq_pending 40
[  +0.010858] NOHZ: local_softirq_pending 40

And another (before any hard freeze or reboot):

[ 1766.489466] Unable to handle kernel NULL pointer dereference at virtual address 00000001
[ 1766.496737] pgd = d4524000
[ 1766.500537] [00000001] *pgd=14523831, *pte=00000000, *ppte=00000000
[ 1766.504369] Internal error: Oops: 17 [#2] ARM
[ 1766.507683] Modules linked in: cmac bnep hci_uart btbcm bluetooth brcmfmac brcmutil snd_bcm2835 cfg80211 snd_pcm snd_timer rfkill snd bcm2835_gpiomem uio_pdrv_genirq uio fixed ip_tables x_tables ipv6
[ 1766.516789] CPU: 0 PID: 1822 Comm: sshd Tainted: G      D         4.9.80+ #1098
[ 1766.519532] Hardware name: BCM2835
[ 1766.521996] task: d45e28e0 task.stack: d51e8000
[ 1766.524782] PC is at ret_fast_syscall+0x0/0x1c
[ 1766.527541] LR is at __f_unlock_pos+0x1c/0x20
[ 1766.530035] pc : [<c000fe40>]    lr : [<c0162f64>]    psr: 60000013
               sp : d51e9fa8  ip : d51e9f60  fp : 00000000
[ 1766.535295] r10: 00000000  r9 : d51e8000  r8 : c000ffc4
[ 1766.538023] r7 : 0000008c  r6 : b6fc1000  r5 : 00000000  r4 : 00000000
[ 1766.540395] r3 : 00000000  r2 : 00000001  r1 : 00000000  r0 : 00000000
[ 1766.542901] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 1766.545469] Control: 00c5387d  Table: 14524008  DAC: 00000055
[ 1766.548381] Process sshd (pid: 1822, stack limit = 0xd51e8188)
[ 1766.551480] Stack: (0xd51e9fa8 to 0xd51ea000)
[ 1766.554274] 9fa0:                   00000000 00000000 00000006 00000000 000002e2 be805d58
[ 1766.560909] 9fc0: 00000000 00000000 b6fc1000 0000008c 0000000a be805f00 00000000 00000000
[ 1766.568064] 9fe0: 00000000 be805d58 b6a84d74 b6aedbe8 80000010 00000006 00000000 00000000
[ 1766.575277] Code: d3a00001 e89da800 00000000 00000000 (e5ad0008)
[ 1766.579137] ---[ end trace f7cd89d2fa4b04d0 ]---

And another (this time ssh stops working but ping replies)

Unable to handle kernel NULL pointer dereference at virtual address 00000008
[  +0.008241] pgd = d4424000
[  +0.004025] [00000008] *pgd=145c7831, *pte=00000000, *ppte=00000000
[  +0.004500] Internal error: Oops: 17 [#3] ARM
[  +0.003847] Modules linked in: cmac bnep hci_uart btbcm bluetooth brcmfmac brcmutil snd_bcm2835 cfg80211 snd_pcm snd_timer rfkill snd bcm2835_gpiomem uio_pdrv_genirq uio fixed ip_tables x_tables ipv6
[  +0.011834] CPU: 0 PID: 1941 Comm: modprobe Tainted: G      D         4.9.80+ #1098
[  +0.008252] Hardware name: BCM2835
[  +0.004382] task: d69edf60 task.stack: d462c000
[  +0.004458] PC is at __vma_adjust+0x18/0x644
[  +0.003961] LR is at down_write+0x1c/0x50
[  +0.003729] pc : [<c011cb68>]    lr : [<c05e15bc>]    psr: 60000013
              sp : d462de34  ip : d462de50  fp : d462de4c
[  +0.007947] r10: d50892c0  r9 : d5093f78  r8 : d5089228
[  +0.004195] r7 : d6e48164  r6 : d6e48178  r5 : d50892c0  r4 : d6870820
[  +0.003990] r3 : d5089228  r2 : ffff0001  r1 : 00000000  r0 : 00000000
[  +0.003854] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  +0.004033] Control: 00c5387d  Table: 14424008  DAC: 00000055
[  +0.003645] Process modprobe (pid: 1941, stack limit = 0xd462c188)
[  +0.003479] Stack: (0xd462de34 to 0xd462e000)
[  +0.003240] de20:                                              d6870820 d462de74 c05e15bc
[  +0.006929] de40: d462de64 d462de50 c05e15bc c05dfd34 d5089228 d6870820 d462de8c d462de68
[  +0.007166] de60: c011cad8 c05e15ac d5089228 b6d99000 d5089220 d6870820 d5093f78 00000075
[  +0.007300] de80: d462def4 d462de90 c011f42c c011caa4 d5089220 00000000 d6b34a00 00000000
[  +0.007776] dea0: 00000000 0013f000 00000000 00000000 0000013f d50892c0 d6b34a00 00000075
[  +0.007966] dec0: 00000000 b6d99000 d462def4 0013f000 00000075 00000005 d6b34a00 d6870820
[  +0.008631] dee0: b6d99000 0000013f d462df34 d462def8 c011fa3c c011f100 00000000 00000000
[  +0.008825] df00: 5aa855df 2ad97987 0000076f d6870854 00000005 0013e540 d6b34a00 00000000
[  +0.009310] df20: d462c000 00000000 d462df74 d462df38 c01076dc c011f718 00000002 00000000
[  +0.009775] df40: 00000000 d462df4c 00000005 00000000 00000000 00000002 00000005 0013e540
[  +0.010438] df60: 00000000 d6b34a00 d462dfa4 d462df78 c011d6f0 c0107658 00000002 00000000
[  +0.011373] df80: befaf8ec 00000000 00000000 befaf8ec 000000c0 c000ffc4 00000000 d462dfa8
[  +0.010882] dfa0: c000fe40 c011d65c 00000000 00000000 00000000 0013e540 00000005 00000802
[  +0.011535] dfc0: 00000000 00000000 befaf8ec 000000c0 b6f1c558 befaf7ac 0013e540 befaf72c
[  +0.011882] dfe0: 00000000 befaf51c b6ef40bc b6f086bc 60000010 00000000 00000000 00000000
[  +0.011645] Code: e24cb004 e24dd034 e52de004 e8bd4000 (e5906008)
[  +0.006647] ---[ end trace f7cd89d2fa4b04d1 ]---
amovitz commented 6 years ago

I'm having this issue when streaming video for long periods of time, even if the CPU average usage is below 30%. The 3B+ hard locks some time between 30 minutes to a few hours, consistently. Using the exact same SD card image, it works perfectly stable for over a week on a 3B. It will usually not let the hardware or software watchdog kick it after the hard lock occurs. I've been seeing this behavior since I got the 3B+ on day one.

JamesH65 commented 6 years ago

@amovitz What version of kernel are you running? Does rpi-update fix the issue.

amovitz commented 6 years ago

It should be noted, the Pi camera module is not being used on my system. It has been tested with every version of the kernel available - rpi-update has not fixed the problem.

This particular test lasted approximately 45 minutes before hard locking this morning.

Linux Pi-2 4.14.42-v7+ #1114 SMP Mon May 21 16:39:21 BST 2018 armv7l armv7l armv7l GNU/Linux

JamesH65 commented 6 years ago

If you drop the max CPU frequency to 1200, does that help?

amovitz commented 6 years ago

I have set arm_freq=1200 and will let it run. I'll report back.

$ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq 600000 $ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 1200000 $ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq 1200000

balboah commented 6 years ago

@amovitz I have that same problem on 3B+ as well without any camera attached, from start it has frozen within a day or maximum 2 days. The recent crash logs are from my Zero W though. I never managed to get a dump from the 3B+, so maybe or maybe not two different issues. And these two rPi's are my first experience, not great :P

JamesH65 commented 6 years ago

Almost certainly different problems, the SoC on the Zero and 3B+, along with the different wireless chips means they are sufficiently dissimilar. Unless (!) there is something in your environment that is triggering the fault. We've sold a lot of Pi's which run for years without issue, so this is unusual. First check is power supply, is it sufficient?

amovitz commented 6 years ago

The power supply I'm using is the branded one. we've tried many different power supplies and it will even hard lock when no applications are running.

We don't use any Zeros, so I couldn't tell you if the same happened for some reason. We have extensively tested with the 3B, running for months at a time, and never had an issue until we put the SD card into a 3B+ (and upgraded the boot for it). Same image works perfectly well on a 3B, but the 3B+ will constantly lock up.

So far, it's been a little over 6 hours, and it hasn't locked with the lowered CPU Freq, but we'll see within 24 hours if it's more stable.

From our testing, it feels like it is an issue with either the upgraded SoC or another chip on the board which has changed that the Kernel is not handling properly or that the Broadcom firmware is not interfacing properly with the new chip.

balboah commented 6 years ago

@JamesH65 yeah I would have expected more people to react if it was common, I must have hit the jackpot with 2 out of 2. My power is the raspberry branded ones which I'm assuming has the correct rating.

The thing both have in common is that the SDCard brand is Sandisk class 10 and that it is written to by dd on mac.

Zero W

The installed image is not customized in any way except for configuring WiFI, camera module via raspi-config and SSH access.

3B+

This one runs Kubernetes master which averages on 20% cpu. I also tried a more recent version with rpi-update without luck. Have not yet tried the frequencey limit. I no longer have a camera or camera module active on this one, which still freeze

amovitz commented 6 years ago

About 1 hour and 15 minutes after my last post (Total Uptime: 07:35:46), it hard locked.

JamesH65 commented 6 years ago

Hmm, if it had kept running at 1200 I would have thought it was one of the test escapes (i.e. an SoC that passed test but shouldn't have). But I believe they work at 1200. So confused, and not sure what to suggest. Might be worth RMA'ing for a replacement.

balboah commented 6 years ago

@amovitz and when it freeze it become burning hot right. sounds very much the same as my 3B+

JamesH65 commented 6 years ago

Burning hot is bad. That would indicate a broken SoC I suspect.

amovitz commented 6 years ago

That's my issue, too. It may be a factory defect since we got the first batch on launch day and all of ours exhibit the same behavior (3 boards). It still freezes even when it's clocked at 1200MHz.

I would definitely agree that it is high above ambient temperature. We even have a custom heatsink on it with a fan and it's still very hot after it hard locks. If it will help, I can try to get external thermal readings and thermal images, but I'm pretty certain it's the SoC that's hot.

JamesH65 commented 6 years ago

@popcornmix @pelwell Any thoughts?

On 25 May 2018 at 14:22, Alex Movitz notifications@github.com wrote:

That's my issue, too. It may be a factory defect since we got the first batch on launch day. It still freezes even when it's clocked at 1200MHz

I would definitely agree that it is high above ambient temperature. We even have a custom heatsink on it with a fan and it's still very hot after it hard locks. If it will help, I can try to get external thermal readings and thermal images, but I'm pretty certain it's the SoC that's hot.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/2555#issuecomment-392055822, or mute the thread https://github.com/notifications/unsubscribe-auth/ADqrHeoMOmt466TZiJpA1CjBm9nJS2V6ks5t2AV4gaJpZM4T8-N0 .

-- James Hughes Principal Software Engineer, Raspberry Pi (Trading) Ltd

zoff99 commented 6 years ago

@amovitz i had the same thing with some PIs. if you just overvoltage (not overclock) it got a bit better in my tests.

add to /boot/config.txt:
over_voltage=5
amovitz commented 6 years ago

Overvoltage did not help, it still hard locks.

imbashamba commented 6 years ago

setup: pi 3B+, Linux raspberrypi 4.14.44-v7+ #1117. Completely stuck after ~1 hour of playing video. gpu_mem=256, everything other is by default. Nothin in syslog, but some strange warnings in dmesg at boot time:

[   41.957094] ------------[ cut here ]------------
[   41.957115] WARNING: CPU: 0 PID: 585 at drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c:541 vchiq_prepare_bulk_data+0x4d0/0x6dc
[   41.957118] Modules linked in: cmac bnep hci_uart btbcm serdev bluetooth ecdh_generic brcmfmac brcmutil cfg80211 rfkill snd_bcm2835(C) snd_pcm snd_timer snd uio_pdrv_genirq fixed uio ip_tables x_tables ipv6
[   41.957173] CPU: 0 PID: 585 Comm: info-beamer Tainted: G         C      4.14.44-v7+ #1117
[   41.957176] Hardware name: BCM2835
[   41.957193] [<8010ffd8>] (unwind_backtrace) from [<8010c240>] (show_stack+0x20/0x24)
[   41.957201] [<8010c240>] (show_stack) from [<80785424>] (dump_stack+0xd4/0x118)
[   41.957210] [<80785424>] (dump_stack) from [<8011da4c>] (__warn+0xf8/0x110)
[   41.957217] [<8011da4c>] (__warn) from [<8011db34>] (warn_slowpath_null+0x30/0x38)
[   41.957224] [<8011db34>] (warn_slowpath_null) from [<80666194>] (vchiq_prepare_bulk_data+0x4d0/0x6dc)
[   41.957233] [<80666194>] (vchiq_prepare_bulk_data) from [<8065e078>] (vchiq_bulk_transfer+0x2e4/0x568)
[   41.957241] [<8065e078>] (vchiq_bulk_transfer) from [<80663aec>] (vchiq_ioctl+0x1344/0x1a14)
[   41.957250] [<80663aec>] (vchiq_ioctl) from [<8029de50>] (do_vfs_ioctl+0xac/0x7c4)
[   41.957257] [<8029de50>] (do_vfs_ioctl) from [<8029e5ac>] (SyS_ioctl+0x44/0x6c)
[   41.957266] [<8029e5ac>] (SyS_ioctl) from [<80108060>] (ret_fast_syscall+0x0/0x28)
[   41.957269] ---[ end trace 0dbf65047004fe73 ]---
balboah commented 6 years ago

So I've gotten a 2nd Zero W and it seems that at least for that model, it's a hardware issue. Running the same camera module and sdcard installation works fine without any issues (for at least 30 minutes so far).

I noticed one difference on the hardware layout, the "H" marking on the chip closest to the camera module connector. Not sure if it helps with anything but here are photos of the two:

Broken one

img_0509

Working one

img_0512

amovitz commented 6 years ago

@balboah Are you still having issues after the latest kernel update?

usedbytes commented 6 years ago

I'm seeing this (or something similar) on the latest kernel (and whatever kernel I was running before I upgraded - still 4.14). It's a Pi Zero W.

raspistill works fine, and will happily capture JPEG and sit with the camera preview running for dozens of seconds. On the other hand raspivid (or RPi-Cam-Web-Interface) will immediately panic the kernel.

The reported process in the Oops and the backtrace appears to be random, though this one does show vc.ril.video_en. Maybe the video codec is trashing random bits of memory somehow?

I don't have a serial port hooked up, and it normally crashes before the Oops makes it onto the SSH session, so gathering useful crash dumps is a bit hard.

[  421.582317] Unable to handle kernel paging request at virtual address 00001038
[  421.589872] pgd = d3fc8000
[  421.592639] [00001038] *pgd=0d7c8831, *pte=00000000, *ppte=00000000
[  421.599216] Internal error: Oops: 17 [#1] ARM
[  421.603660] Modules linked in: fuse rfcomm cmac bnep hci_uart btbcm serdev bluetooth ecdh_generic evdev spidev brcmfmac brcmutil cfg80211 rfkill snd_bcm2835(C) gpio_keys snd_pcm snd_timer snd spi_bcm2835 fixed uio_pdrv_genirq uio hid_sony ff_memless i2c_dev ip_tables x_tables ipv6
[  421.628996] CPU: 0 PID: 1000 Comm: vc.ril.video_en Tainted: G         C      4.14.67+ #1139
[  421.637475] Hardware name: BCM2835
[  421.640930] task: cd7f3780 task.stack: cd454000
[  421.645556] PC is at grab_cache_page_write_begin+0x20/0x3c
[  421.651146] LR is at ext4_da_write_begin+0xb8/0x438
[  421.656096] pc : [<c00f5020>]    lr : [<c02025b0>]    psr: 60000013
[  421.662453] sp : cd455d40  ip : cd455d58  fp : cd455d54
[  421.667751] r10: c01fbcc4  r9 : cd455df4  r8 : 00001000
[  421.673053] r7 : 00001000  r6 : c8903638  r5 : d6ec81e0  r4 : d7d697a8
[  421.679675] r3 : d7fc3320  r2 : 0000000e  r1 : 000000f2  r0 : 00001000
[  421.686298] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  421.693539] Control: 00c5387d  Table: 13fc8008  DAC: 00000055
[  421.699369] Process vc.ril.video_en (pid: 1000, stack limit = 0xcd454188)
[  421.706259] Stack: (0xcd455d40 to 0xcd456000)
[  421.710688] 5d40: d7fc3320 d7d697a8 cd455dc4 cd455d58 c02025b0 c00f500c 00000000 c8903638

Message from syslogd@minimouse at Sep  3 21:48:04 ...
 kernel:[  421.599216] Internal error: Oops: 17 [#1] ARM
[  421.718991] 5d60: 00001000 00001000 000f3000 00000000 00001000 00000000 000000f2 00001000
[  421.727296] 5d80: d7d69784 00000000 000f2000 00000000 d689fd20 00000000 c0204a60 00000000
[  421.735600] 5da0: 00001000 c8903714 cd455ef0 00001000 00001000 c065f388 cd455e24 cd455dc8
[  421.743903] 5dc0: c00f51d4 c0202504 00001000 00000000 cd455df0 cd455df4 c8903638 cd454000
[  421.752206] 5de0: 00008000 d689fd20 000f2000 00000000 d7d69784 00000000 5b8dac14 00000000
balboah commented 6 years ago

@amovitz sorry for the late reply. After replacing the hardware I am no longer experiencing the problem (still the same sdcard) and I've stopped using the broken one. Maybe there are faulty hardware or some small differences which cause the bug to trigger

usedbytes commented 6 years ago

I did some experiments. On a different Pi Zero, my card + camera + power supply worked fine. That seems to point to an issue with my actual hardware, supporting @balboah's theory.

Another example of an Oops when running the camera below:

[  153.531076] NOHZ: local_softirq_pending 40
[  278.981974] NOHZ: local_softirq_pending 40
[  282.294591] NOHZ: local_softirq_pending 40
[  282.716442] ------------[ cut here ]------------
[  282.721176] kernel BUG at Returning to usermode but unexpected PSR bits
set?:5!
[  282.728604] Internal error: Oops - BUG: 0 [#1] ARM
[  282.733469] Modules linked in: fuse rfcomm cmac bnep hci_uart btbcm serdev
bluetooth ecdh_generic evdev spidev brcmfmac brcmutil gpio_keys snd_bcm2835(C)
cfg80211 snd_pcm snd_timer rfkill snd i2c_bcm2835 spi_bcm2835 uio_pdrv_genirq
uio fixed hid_sony ff_memless i2c_dev ip_tables x_tables ipv6
[  282.759865] CPU: 0 PID: 596 Comm: apache2 Tainted: G         C
4.14.67+ #1139
[  282.767548] Hardware name: BCM2835
[  282.770999] task: cc858000 task.stack: cc89a000
[  282.775623] PC is at no_work_pending+0x30/0x34
[  282.780140] LR is at 0xbeb31b40
[  282.783329] pc : [<c000fe54>]    lr : [<beb31b40>]    psr: 20000013
[  282.789685] sp : cc89bfa8  ip : cc89bfa8  fp : 00000000
[  282.794984] r10: 00000000  r9 : cc89a000  r8 : c000ff64
[  282.800282] r7 : 0000008c  r6 : b5a621b4  r5 : b5a620a0  r4 : 00000000
[  282.806903] r3 : c093d414  r2 : cc89bfe4  r1 : b60bb98c  r0 : 00000000
[  282.813526] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment
user
[  282.820768] Control: 00c5387d  Table: 0c8b8008  DAC: 00000055
[  282.826598] Process apache2 (pid: 596, stack limit = 0xcc89a188)
[  282.832694] Stack: (0xcc89bfa8 to 0xcc89c000)
[  282.837119] bfa0:                   00000000 b5a620a0 0000000b 00000000
00002518 beb31b40
[  282.845421] bfc0: 00000000 b5a620a0 b5a621b4 0000008c b60bb954 b6f6bce8
00000000 beb3650c
[  282.853724] bfe0: b623e470 beb31b40 b60bb98c b6e16ba8 20000010 0000000b
00000000 00000000
[  282.862041] Code: e9527fff e1a00000 e28dd048 e1b0f00e (e7f001f2) 

Pretty annoyed that this seems to be a HW issue, as this zero is now deeply embedded in the project.

usedbytes commented 6 years ago

With further fiddling, and following @zoff99 and this forum thread: https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=212777, I resolved my issue by over-volting.

Seems like Broadcom/RPi shipped a batch or two of marginal chips.

chopeen commented 5 years ago

@usedbytes Can you post the exact settings you used? Did you change only over_voltage or any other parameters, too?

@balboah I am experiencing a similar problem with Zero W and mine also has the "H" marking on the chip near the camera connector.


UPDATE: The symptoms were similar but my issues were caused by an poor power supply. Initially, I was trying to power a Raspberry from a USB port. After two days of playing with different boot options, SD cards, heat sinks, I realized a USB port may be enough to boot a Raspberry and keep it running when idle, but that's it. Run apt or do anything more than idling and it will crash.

I connected it to an actual power supply, it is running stable. Check the official power requirements for details.

usedbytes commented 5 years ago

@chopeen :

force_turbo=1
over_voltage=4
manoloromero commented 5 years ago

The same sitution for me, crash and the board has the H and the dot. Dod someone found a solution? Like a kernel patch, etc.

JamesH65 commented 5 years ago

Well, it;s not a particularly repeatable problem, as most people do not see an issue. I pressume you

1) Have a decent power supply 2) Are using up to date software?

manoloromero commented 5 years ago

Hi

1.- The two power supply tested are also used with one RPI3, with 4 USB devices connected, without any problem. Anyway I will check again with a new power supply that I just received and provide more feedback. 2.- I have tested different OS. Included the last version of Raspbian. Also different SD cards.

I have received a new RPI Zero 1.3. I will check this weekend. But with the RPI Zero W the situation is the same, some time after I used the camera I have the panic.

manoloromero commented 5 years ago

Hi

I made some tests.

Three different power supply. Different microSD cards.

The situation is: I get a kernel panic when I access the stream from the PI NoIR camera. For example. I start RTSP server ( https://github.com/mpromonet/v4l2rtspserver ), all is ok until I connect a plyer to the RTSP port, like VLC. I get a kernel panic. This also happens using other ways to access the video device, like raspistill, ...

I remove bcm2835-v4l2 driver and use the unofficial driver from UV4L project. Also remove other modules related to bcm. When I do this I have a segfault, but not a kernel panic. So I have the output.

Mar 6 20:48:29 e2 kernel: [ 66.890864] bcm2835-codec bcm2835-codec: Removing bcm2835-codec Mar 6 20:48:29 e2 kernel: [ 66.897596] bcm2835-codec bcm2835-codec: Removing bcm2835-codec Mar 6 20:50:15 e2 kernel: [ 66.898701] bcm2835-codec bcm2835-codec: Removing bcm2835-codec Mar 6 20:50:15 e2 kernel: [ 172.690461] Unable to handle kernel paging request at virtual address 00c07170 Mar 6 20:50:15 e2 kernel: [ 172.696889] pgd = bc7448ad Mar 6 20:50:15 e2 kernel: [ 172.702382] [00c07170] *pgd=00000000 Mar 6 20:50:15 e2 kernel: [ 172.708151] Internal error: Oops: 5 [#1] ARM Mar 6 20:50:15 e2 kernel: [ 172.712794] Modules linked in: bnep hci_uart btbcm serdev bluetooth ecdh_generic joydev evdev brcmfmac brcmutil sha256_generic raspberrypi_hwmon snd_bcm2835(C) hwmon snd_pcm snd_timer cfg80211 snd rfkill v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media vc_sm_cma(C) uio_pdrv_genirq fixed uio cuse fuse ip_tables x_tables ipv6 [last unloaded: bcm2835_mmal_vchiq] Mar 6 20:50:15 e2 kernel: [ 172.731954] Unable to handle kernel paging request at virtual address dff58040 Mar 6 20:50:15 e2 kernel: [ 172.731961] pgd = bc7448ad Mar 6 20:50:15 e2 kernel: [ 172.731967] [dff58040] *pgd=00000000 Mar 6 20:50:15 e2 kernel: [ 172.731973] Internal error: Oops: 5 [#2] ARM Mar 6 20:50:15 e2 kernel: [ 172.731975] Modules linked in: bnep hci_uart btbcm serdev bluetooth ecdh_generic joydev evdev brcmfmac brcmutil sha256_generic raspberrypi_hwmon snd_bcm2835(C) hwmon snd_pcm snd_timer cfg80211 snd rfkill v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media vc_sm_cma(C) uio_pdrv_genirq fixed uio cuse fuse ip_tables x_tables ipv6 [last unloaded: bcm2835_mmal_vchiq] Mar 6 20:50:15 e2 kernel: [ 172.732039] CPU: 0 PID: 637 Comm: v4l2rtspserver Tainted: G C 4.19.25+ #1205 Mar 6 20:50:15 e2 kernel: [ 172.732041] Hardware name: BCM2835 Mar 6 20:50:15 e2 kernel: [ 172.732044] PC is at cfb_imageblit+0x42c/0x980 Mar 6 20:50:15 e2 kernel: [ 172.732047] LR is at 0xda800000 Mar 6 20:50:15 e2 kernel: [ 172.732051] pc : [<c03f3be8>] lr : [<da800000>] psr: 20000193 Mar 6 20:50:15 e2 kernel: [ 172.732053] sp : d393b880 ip : 00000000 fp : d393b8d4 Mar 6 20:50:15 e2 kernel: [ 172.732056] r10: 00000008 r9 : 000000ff r8 : 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732060] r7 : 00000020 r6 : 00000020 r5 : ff000000 r4 : ffaaaaaa Mar 6 20:50:15 e2 kernel: [ 172.732063] r3 : dff58040 r2 : d703e100 r1 : 00001580 r0 : d7017800 Mar 6 20:50:15 e2 kernel: [ 172.732068] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Mar 6 20:50:15 e2 kernel: [ 172.732071] Control: 00c5387d Table: 13800008 DAC: 00000055 Mar 6 20:50:15 e2 kernel: [ 172.732075] Process v4l2rtspserver (pid: 637, stack limit = 0xf78ab353) Mar 6 20:50:15 e2 kernel: [ 172.732078] Stack: (0xd393b880 to 0xd393c000) Mar 6 20:50:15 e2 kernel: [ 172.732082] b880: 00000004 00005600 00001580 dff58040 00000007 00000038 0000000f 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732086] b8a0: d703e100 dff58040 f0980540 00000000 d703e107 d72af704 00000000 00000001 Mar 6 20:50:15 e2 kernel: [ 172.732092] b8c0: 000000ff d702a400 d393b8e4 d393b8d8 c03f4634 c03f37c8 d393b9b4 d393b8e8 Mar 6 20:50:15 e2 kernel: [ 172.732096] b8e0: c03f2230 c03f4628 00000001 dabf3120 d703f963 d7210960 d393b98c d702a508 Mar 6 20:50:15 e2 kernel: [ 172.732101] b900: ffffffff d70179a8 ffffffff 00000000 00000200 00000000 00000000 d7017800 Mar 6 20:50:15 e2 kernel: [ 172.732105] b920: d72af6f6 00000007 00000007 c03f4628 00000010 00000001 00000000 00000001 Mar 6 20:50:15 e2 kernel: [ 172.732110] b940: 00000000 00000010 d393b98c 00000208 00000280 00000038 00000010 00000007 Mar 6 20:50:15 e2 kernel: [ 172.732115] b960: 00000000 d393b901 d703e100 c03f2400 00000000 00000000 dac06a80 00000002 Mar 6 20:50:15 e2 kernel: [ 172.732119] b980: 00000720 9598c53f d393b9b4 d7017800 d702a400 00000028 d72af6f6 00000007 Mar 6 20:50:15 e2 kernel: [ 172.732123] b9a0: c03f1fb4 00000007 d393b9f4 d393b9b8 c03eba48 c03f1fc0 00000028 00000041 Mar 6 20:50:15 e2 kernel: [ 172.732128] b9c0: 00000007 00000000 00000049 d72af706 00000720 00000008 d72af5b0 00000700 Mar 6 20:50:15 e2 kernel: [ 172.732133] b9e0: 00000041 d72af6f6 d393ba3c d393b9f8 c03edeb8 c03eb960 00000041 d393ba08 Mar 6 20:50:15 e2 kernel: [ 172.732138] ba00: 00000007 d702a400 00000028 d72af7c8 d393ba54 d702a400 00000001 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732142] ba20: c0b0d1c0 c09d5ae8 0000002f 00000005 d393baa4 d393ba40 c03f0ab8 c03edd60 Mar 6 20:50:15 e2 kernel: [ 172.732146] ba40: 000000aa d7017800 d72afefe 00000049 00000000 00000030 00000005 d7017800 Mar 6 20:50:15 e2 kernel: [ 172.732152] ba60: 00000000 c0b0f1b8 d702a400 00000000 00000001 c0b0d208 000002f0 00000005 Mar 6 20:50:15 e2 kernel: [ 172.732157] ba80: 00000000 c0a739de 00000030 d702a400 c09d5028 00000001 d393baec d393baa8 Mar 6 20:50:15 e2 kernel: [ 172.732161] baa0: c042963c c03f0848 00000001 9598c53f d393badc d7017800 d702a400 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732165] bac0: d72afe6c d702a400 c09d5028 c0a739de 00000000 00000049 c09d5028 00000001 Mar 6 20:50:15 e2 kernel: [ 172.732169] bae0: d393bb1c d393baf0 c04296f8 c0429478 00000001 d702a400 0000000a c0a739de Mar 6 20:50:15 e2 kernel: [ 172.732173] bb00: 00000000 9598c53f d702a400 0000000a d393bb74 d393bb20 c042a2f0 c0429654 Mar 6 20:50:15 e2 kernel: [ 172.732178] bb20: 00000000 0000018e c0a7ab6c 00000000 d72afe6c c0a739de 0000019d d702a400 Mar 6 20:50:15 e2 kernel: [ 172.732182] bb40: 0000005d 9598c53f 0000000a c0a348d4 00000000 0000019e 00004f08 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732187] bb60: c0a75320 c0a73328 d393bbcc d393bb78 c0063e28 c042a0b8 c005f07c c00d21b8 Mar 6 20:50:15 e2 kernel: [ 172.732191] bb80: 00000000 c0065904 60000193 00000000 00000109 00000000 00000108 0000000f Mar 6 20:50:15 e2 kernel: [ 172.732196] bba0: 00000000 00000000 00000000 00000000 00000000 c09d9e5c 60000193 d38e7000 Mar 6 20:50:15 e2 kernel: [ 172.732200] bbc0: d393bc1c d393bbd0 c0065904 c0063c14 c0820048 d393bc7c 00000107 0000000f Mar 6 20:50:15 e2 kernel: [ 172.732204] bbe0: 00000020 00000020 00000000 00000000 d393bc0c c09d5028 c09e7260 c09d5028 Mar 6 20:50:15 e2 kernel: [ 172.732208] bc00: c08200b0 c09d9e5c 00000005 d38e7000 d393bc34 d393bc20 c0065cf0 c00657b4 Mar 6 20:50:15 e2 kernel: [ 172.732212] bc20: c0820048 d393bc7c d393bc54 d393bc38 c00667a4 c0065cb0 60000113 d393be48 Mar 6 20:50:15 e2 kernel: [ 172.732218] bc40: c08200b0 c09d5028 d393bc74 d393bc58 c00660c4 c0066760 d393bc7c 9598c53f Mar 6 20:50:15 e2 kernel: [ 172.732222] bc60: d393bc84 c09e725c d393bcc4 d393bc88 c0091664 c0066098 c0820048 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732226] bc80: 9598c53f 9598c53f 00000000 4300bc98 d3930029 c08200b0 00000005 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732230] bca0: 0000000b 9598c53f d393bcfc c0a70744 60000113 d393be48 d393bcfc d393bcc8 Mar 6 20:50:15 e2 kernel: [ 172.732236] bcc0: c0014f38 c00915fc 00000000 0000000b c0820048 00c07170 00000005 d393be48 Mar 6 20:50:15 e2 kernel: [ 172.732241] bce0: d39c56c0 d39c56c0 d39c56f8 00000014 d393bd14 d393bd00 c001ac9c c0014dac Mar 6 20:50:15 e2 kernel: [ 172.732245] bd00: d393be48 00c07170 d393bd6c d393bd18 c06c81d0 c001ac44 00000000 00000100 Mar 6 20:50:15 e2 kernel: [ 172.732249] bd20: d393bd4c d393bd30 c00d1f9c c00d309c a0000113 00000000 00000000 00010000 Mar 6 20:50:15 e2 kernel: [ 172.732254] bd40: d393bd64 00000005 00000005 c09d5028 c06c834c 00c07170 d393be48 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732259] bd60: d393bd8c d393bd70 c06c83f8 c06c7fa0 9598c53f c09da180 00000005 c09d5028 Mar 6 20:50:15 e2 kernel: [ 172.732264] bd80: d393be44 d393bd90 c001aabc c06c8358 d393bdbc d393bda0 60000113 c09eb838 Mar 6 20:50:15 e2 kernel: [ 172.732268] bda0: c006fb38 00000000 be846258 c09d5028 be8462d8 be846358 00000004 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732272] bdc0: 00000004 c03a70f8 d393bf34 d393bdd8 c0187f5c c03a70b0 ffffffff d393be0c Mar 6 20:50:15 e2 kernel: [ 172.732278] bde0: d393bf48 be846358 d393be3c d393be04 d393be08 d393be0c d393be10 d393be14 Mar 6 20:50:15 e2 kernel: [ 172.732283] be00: c058066c c000990c ffffffff d393be7c 00000000 d393a000 d393be44 9598c53f Mar 6 20:50:15 e2 kernel: [ 172.732287] be20: c058066c 20000013 ffffffff d393be7c 00000000 d393a000 d393bf8c d393be48 Mar 6 20:50:15 e2 kernel: [ 172.732291] be40: c0009914 c001aa78 d4e9ad80 d392ad20 00000001 d392ad21 00c07170 c09d5028 Mar 6 20:50:15 e2 kernel: [ 172.732295] be60: 00000010 be845d64 00000000 d393a000 00000000 d393bf8c c07170c0 d393be98 Mar 6 20:50:15 e2 kernel: [ 172.732301] be80: 00000010 c058066c 20000013 ffffffff c0580660 bf000000 d393bebc c09eb838 Mar 6 20:50:15 e2 kernel: [ 172.732305] bea0: c004ad08 fffffff7 00000001 007ff4d0 000005b0 d393bec0 c00d1f9c 00000001 Mar 6 20:50:15 e2 kernel: [ 172.732310] bec0: 00000000 000005b0 d393beac 00000001 d393bf14 c0188064 c004ad08 c00d1f64 Mar 6 20:50:15 e2 kernel: [ 172.732314] bee0: d38e7000 c09de0c0 1db7245b c09d5028 d393bf14 d38e7000 d6907000 c00781dc Mar 6 20:50:15 e2 kernel: [ 172.732320] bf00: d393bf44 d393bf10 c00781dc c0548efc 5f2131bb 00008b8e be846258 d393bf7c Mar 6 20:50:15 e2 kernel: [ 172.732325] bf20: c09d5028 c09d5028 0000004e c00091a4 d393a000 00000000 d393bf74 d393bf48 Mar 6 20:50:15 e2 kernel: [ 172.732329] bf40: c007830c be8460a4 c09d5028 0000004e c00091a4 be8460a4 00000008 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732333] bf60: 00000000 9598c53f d393bfa4 be845d64 00000010 bf08020a 00000122 c00091a4 Mar 6 20:50:15 e2 kernel: [ 172.732337] bf80: d393bfa4 d393bf90 c058073c c0580608 be845d64 00000010 00000000 d393bfa8 Mar 6 20:50:15 e2 kernel: [ 172.732343] bfa0: c0009000 c058071c be845d64 00000010 00000010 007ff4d0 000005b0 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732348] bfc0: be845d64 00000010 bf08020a 00000122 00000000 00000000 b6fc1000 be845d7c Mar 6 20:50:15 e2 kernel: [ 172.732352] bfe0: 00000000 be845cd8 00000000 b6f42a9c 80000010 00000010 00000000 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732357] [<c03f3be8>] (cfb_imageblit) from [<c03f4634>] (bcm2708_fb_imageblit+0x18/0x1c) Mar 6 20:50:15 e2 kernel: [ 172.732362] [<c03f4634>] (bcm2708_fb_imageblit) from [<c03f2230>] (bit_putcs+0x27c/0x440) Mar 6 20:50:15 e2 kernel: [ 172.732366] [<c03f2230>] (bit_putcs) from [<c03eba48>] (fbcon_putcs+0xf4/0x12c) Mar 6 20:50:15 e2 kernel: [ 172.732369] [<c03eba48>] (fbcon_putcs) from [<c03edeb8>] (fbcon_redraw+0x164/0x1c4) Mar 6 20:50:15 e2 kernel: [ 172.732373] [<c03edeb8>] (fbcon_redraw) from [<c03f0ab8>] (fbcon_scroll+0x27c/0xdd0) Mar 6 20:50:15 e2 kernel: [ 172.732377] [<c03f0ab8>] (fbcon_scroll) from [<c042963c>] (con_scroll+0x1d0/0x1dc) Mar 6 20:50:15 e2 kernel: [ 172.732381] [<c042963c>] (con_scroll) from [<c04296f8>] (lf+0xb0/0xc0) Mar 6 20:50:15 e2 kernel: [ 172.732386] [<c04296f8>] (lf) from [<c042a2f0>] (vt_console_print+0x244/0x370) Mar 6 20:50:15 e2 kernel: [ 172.732390] [<c042a2f0>] (vt_console_print) from [<c0063e28>] (console_unlock+0x220/0x504) Mar 6 20:50:15 e2 kernel: [ 172.732395] [<c0063e28>] (console_unlock) from [<c0065904>] (vprintk_emit+0x15c/0x360) Mar 6 20:50:15 e2 kernel: [ 172.732399] [<c0065904>] (vprintk_emit) from [<c0065cf0>] (vprintk_default+0x4c/0x7c) Mar 6 20:50:15 e2 kernel: [ 172.732403] [<c0065cf0>] (vprintk_default) from [<c00667a4>] (vprintk_func+0x50/0xb8) Mar 6 20:50:15 e2 kernel: [ 172.732408] [<c00667a4>] (vprintk_func) from [<c00660c4>] (printk+0x3c/0x5c) Mar 6 20:50:15 e2 kernel: [ 172.732412] [<c00660c4>] (printk) from [<c0091664>] (print_modules+0x74/0xdc) Mar 6 20:50:15 e2 kernel: [ 172.732416] [<c0091664>] (print_modules) from [<c0014f38>] (die+0x198/0x2a4) Mar 6 20:50:15 e2 kernel: [ 172.732420] [<c0014f38>] (die) from [<c001ac9c>] (__do_kernel_fault.part.0+0x64/0x84) Mar 6 20:50:15 e2 kernel: [ 172.732425] [<c001ac9c>] (__do_kernel_fault.part.0) from [<c06c81d0>] (do_page_fault+0x23c/0x3b8) Mar 6 20:50:15 e2 kernel: [ 172.732429] [<c06c81d0>] (do_page_fault) from [<c06c83f8>] (do_translation_fault+0xac/0xb4) Mar 6 20:50:15 e2 kernel: [ 172.732433] [<c06c83f8>] (do_translation_fault) from [<c001aabc>] (do_DataAbort+0x50/0xf4) Mar 6 20:50:15 e2 kernel: [ 172.732437] [<c001aabc>] (do_DataAbort) from [<c0009914>] (__dabt_svc+0x54/0x80) Mar 6 20:50:15 e2 kernel: [ 172.732442] Exception stack(0xd393be48 to 0xd393be90) Mar 6 20:50:15 e2 kernel: [ 172.732448] be40: d4e9ad80 d392ad20 00000001 d392ad21 00c07170 c09d5028 Mar 6 20:50:15 e2 kernel: [ 172.732452] be60: 00000010 be845d64 00000000 d393a000 00000000 d393bf8c c07170c0 d393be98 Mar 6 20:50:15 e2 kernel: [ 172.732455] be80: 00000010 c058066c 20000013 ffffffff Mar 6 20:50:15 e2 kernel: [ 172.732459] [<c0009914>] (__dabt_svc) from [<c058066c>] (__sys_sendto+0x70/0x114) Mar 6 20:50:15 e2 kernel: [ 172.732463] [<c058066c>] (__sys_sendto) from [<c058073c>] (sys_sendto+0x2c/0x34) Mar 6 20:50:15 e2 kernel: [ 172.732467] [<c058073c>] (sys_sendto) from [<c0009000>] (ret_fast_syscall+0x0/0x28) Mar 6 20:50:15 e2 kernel: [ 172.732470] Exception stack(0xd393bfa8 to 0xd393bff0) Mar 6 20:50:15 e2 kernel: [ 172.732474] bfa0: be845d64 00000010 00000010 007ff4d0 000005b0 00000000 Mar 6 20:50:15 e2 kernel: [ 172.732478] bfc0: be845d64 00000010 bf08020a 00000122 00000000 00000000 b6fc1000 be845d7c Mar 6 20:50:15 e2 kernel: [ 172.732480] bfe0 Mar 6 20:50:15 e2 kernel: [ 172.732489] Lost 2 message(s)!

manoloromero commented 5 years ago

I test with a RPI Zero and with a new RPI Zero W and with both is is. So I think that is a hardware issue. Now the system is working perfectly.

vascojdb commented 5 years ago

Hi guys, any update on this? I was trying to use raspivid on a zero W with a TCP stream and after some minutes my Wi-Fi is gone, I guess it's the same issue.

lategoodbye commented 5 years ago

@vascojdb Do you see a kernel oops in dmesg?

vascojdb commented 5 years ago

@lategoodbye i don't think I have seen oops, I saw something related to mail(something), sorry I'm out of home for a week so I can't exactly remember. My Wi-Fi drops completely and only a reboot solves it. I'm still trying to figure out if this is the same problem or the other opened issue (opened 3 years ago) around here related to Wi-Fi dropping on Pi zero w while streaming video. That is exactly my case and when it happens

pabclsn commented 5 years ago

Same here RPI Zero W brand new, brand new SD card and power supply, test with a power supply from my pi3 and an other SD card, Camera Module 2.1. I have the H dot near the camera connector too. I'm getting kernel panic within 5 minutes

pabclsn commented 5 years ago

I downclock the CPU to 950 and no more problem. Not losing wifi anymore :)

IgnacioHR commented 5 years ago

I'm having the same issue here. Also My Pi zero-W has the H* mark and the pi hangs after less than a minute of "video" reproduction. BUT. I realised the command that freezes the PI is

raspivid -n -w 1280 -h 720 --bitrate 2000000 --framerate 30 -t 0 -ih --profile baseline --intra 30 --flush -o -| gst-launch-1.0 -v fdsrc do-timestamp=true ! h264parse ! rtph264pay config-interval=1 pt=96 ! udpsink host=127.0.0.1 port=8004

but the left part of the pipe that reads the camera: raspivid -n -w 1280 -h 720 --bitrate 2000000 --framerate 30 -t 0 -ih --profile baseline --intra 30 --flush -o - | cat > /dev/null seems to be working fine for longer. It is when I pipe to gst-launch-1.0 when the pi crashes after some time.

IgnacioHR commented 5 years ago

After reading this thread I'm having good results after underclocking raspberry pi as follows:

/boot/config.txt

arm_freq=600
arm_freq_max=700
arm_freq_min=500

CPU is running at 500000 MHz (half the nominal speed) and I'm going to report back to the seller of the card because this is not working as indicated in the specifications

eldad-a commented 5 years ago

After reading this thread I'm having good results after underclocking raspberry pi as follows:

/boot/config.txt

arm_freq=600
arm_freq_max=700
arm_freq_min=500

CPU is running at 500000 MHz (half the nominal speed) and I'm going to report back to the seller of the card because this is not working as indicated in the specifications

This issue affected me as well, on 3 out of 6 Pi-Zeros W, running RPi-Cam-Web-Interface (not sure whether this has anything to do with the issue).

Thanks @IgnacioHR , your suggestion seems to resolve the matter for me (based on 4 days run so far).

avanc commented 4 years ago

I can confirm the very same problem with my Raspberry Pi Zero W (Revision with H*) an dsolved ith with arm_freq=950.

It would be interesting to get to know what the differences between the two HW revisions are!

avanc commented 4 years ago

I was to early: 950MHz reduces the freezes, but they still occur.

@eldad-a Had all six Raspi Zeros the H*?

eldad-a commented 4 years ago

I was to early: 950MHz reduces the freezes, but they still occur.

@eldad-a Had all six Raspi Zeros the H*?

@avanc Sorry this is not working for you :-/ The machines are currently isolated in terms of light, so I cannot check the H*; I should be able to so in the coming weeks. Is there other information I can provide? Anything that's accessible through SSH should work.

NB: I have it set at arm_freq=600 (not 950)

eldad-a commented 4 years ago

I was to early: 950MHz reduces the freezes, but they still occur.

@eldad-a Had all six Raspi Zeros the H*?

@avanc Hi again, Just wanted to let you know that I have 5 Pi Zero W having the H* mark; I have 4 more which are currently visually inaccessible; will check them in a couple of weeks when the experiment is completed. All 9 were purchased from the same supplier (Adafruit) in the past six months.

Hope this is of any help

avanc commented 4 years ago

@eldad-a Thanks for the update.

I also got freezes with 600MHz. So it seems the device is really broken. As last try, I will install Rasbian and do some tests.

JamesH65 commented 4 years ago

Hi all, sorry about the delay in replying. Can anyone seeing this issue who hasn't tried it, try a core voltage increase to see if this changes anything?

In config.txt, add the line

over_voltage=1

If no help, slightly larger numbers might help.

avanc commented 4 years ago

@JamesH65 No luck. I tried over_voltage=1, but my Raspberry Pi still hangs after some time (approx. 30 min). So I guess its really a HW issue.

JamesH65 commented 4 years ago

Did you try larger numbers for over_voltage?

avanc commented 4 years ago

After reading a little more about over_voltage, I'm confused: According to https://www.raspberrypi.org/documentation/configuration/config-txt/overclocking.md, the default value for the Raspberry Pi Zero W is already over_voltage=6.

Shall I set it to 7? In that case I have to set force_turbo and loose warranty.