raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.22k stars 5.02k forks source link

4.14.69 oops/panic on shutdown #2681

Closed mutability closed 6 years ago

mutability commented 6 years ago

On a 3B+ running 4.14.69 (from raspberrypi-kernel 1.20180910-1) I get intermittent oops or panic during shutdown.

I don't get this when running 4.14.50 from raspberrypi-kernel 1.20180619-1 (shipped with 2018-06-27-raspbian-stretch-lite)

The system is a vanilla raspbian-stretch-lite with the only changes being to enable ssh & serial console (but I've also seen the panic without a serial console enabled). To reproduce:

The exact oops/panic varies; I've attached some examples:

oops-1a.txt (file_free_rcu in softirq handler) oops-1b.txt (file_free_rcu in softirq handler again) oops-2.txt (sys_open -> handle_mm_fault -> double fault)

Kernel/hardware is:

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.14.69-v7+ (dc4@dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1141 SMP Mon Sep 10 15:26:29 BST 2018
[    0.000000] CPU: ARMv7 Processor [410fd034] revision 4 (ARMv7), cr=10c5383d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] OF: fdt: Machine model: Raspberry Pi 3 Model B Plus Rev 1.3
[    0.000000] Memory policy: Data cache writealloc
[    0.000000] cma: Reserved 8 MiB at 0x3ac00000
[    0.000000] percpu: Embedded 17 pages/cpu @ba348000 s38720 r8192 d22720 u69632
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 240555
[    0.000000] Kernel command line: 8250.nr_uarts=1 bcm2708_fb.fbwidth=800 bcm2708_fb.fbheight=600 bcm2708_fb.fbswap=1 vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=PARTUUID=f4886cd5-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait
mutability commented 6 years ago

To narrow it down further, 4.14.62-v7+ from raspberrypi-kernel 1.20180817-1 appears to be OK

popcornmix commented 6 years ago

Can you identify the exact update which caused this. See: https://github.com/Hexxeh/rpi-firmware/commits/master

If you click on each commit the end of the url contains a git hash. Run sudo rpi-update <hash> to revert back to that version. Report the first version with this error.

mutability commented 6 years ago

4.14.68 (a5b781c7a761664226ff9654416776d372f8bbf0) is bad (file_free_rcu oops/panic) 4.14.67 (45782b55788c58b10b6376487fd86ca9c13296e1) is bad (file_free_rcu oops/panic) 4.14.66 (66bb2b42f07e2495b54804385eaed593ee851cd1) is bad (file_free_rcu oops/panic) 4.14.62 (911147a3276beee09afc4237e1b7b964e61fb88a) is good, no problems over a couple of hours of reboots

The file_free_rcu oops is the main failure mode here, I didn't see the double-fault failure in this round of testing. I also saw a couple of cases on the bad versions listed where the shutdown process would apparently hang entirely with no kernel messages.

popcornmix commented 6 years ago

Possibly related to https://github.com/raspberrypi/linux/issues/2680 (which started with the same kernel update) you've identified. We're waiting for testing results from a possible fix - I'll let you know when it's ready.

popcornmix commented 6 years ago

Any change with latest rpi-update kernel?

mutability commented 6 years ago

No joy, same oops in file_free_rcu:

[    0.000000] Linux version 4.14.69-v7+ (dc4@dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1142 SMP Fri Sep 14 20:48:53 BST 2018
[...]
[   46.191480] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   46.200476] pgd = b7368000
[   46.200483] [00000000] *pgd=00000000
[   46.200500] Internal error: Oops: 5 [#1] SMP ARM
[   46.200505] Modules linked in: cmac bnep hci_uart btbcm serdev bluetooth ecdh_generic evdev ir_lirc_codec lirc_dev r820t rtl2832 i2c_mux brcmfmac dvb_usb_rtl28xxu dvb_usb_v2 brcmutil dvb_core cfg80211 
rfkill snd_bcm2835(C) snd_pcm snd_timer snd fixed uio_pdrv_genirq uio ip_tables x_tables ipv6
[   46.200610] CPU: 0 PID: 656 Comm: systemd-cgroups Tainted: G        WC      4.14.69-v7+ #1142
[   46.200613] Hardware name: BCM2835
[   46.200619] task: b7008000 task.stack: b67f0000
[   46.200635] PC is at file_free_rcu+0x24/0x60
[   46.200648] LR is at rcu_process_callbacks+0x28c/0x66c
Handyone commented 6 years ago

Same as everyone, stuck on reboot/shutdown. only fix is to rollback to 4.14.62 sudo rpi-update 911147a3276beee09afc4237e1b7b964e61fb88a And same issue as in here https://github.com/Hexxeh/rpi-firmware/issues/186 with lots of screenshots posted.

lategoodbye commented 6 years ago

I'm not able to reproduce this issue by placing a reboot into rc.local (using the same kernel, firmware and same RPi model).

Could you please provide more information: Are you using mounting network shares? How do you trigger the reboot (via USB keyboard, SSH LAN/Wifi)? Is the Ethernet cable / HDMI connected during reboot? What is connected to USB?

mutability commented 6 years ago

No network mounts. No USB devices. HDMI and Ethernet connected. WiFi not in use. Reboots are triggered over SSH after the boot process is complete:

while :; do ssh pi@raspberrypi.local "sleep 30 && echo REBOOT && sudo reboot"; sleep 5; done

lategoodbye commented 6 years ago

Okay, i could reproduce it and can confirm your observations. It seems to be necessary to use Raspbian Lite 2018-06-27. But it's not necessary to trigger the reboot via SSH.

tommeh1337 commented 6 years ago

I have this exact same problem. I own three RPi 3B's which all have this same problem, I already wanted to return the most recent Raspberry Pi model 3B+ I bought because I thought the hardware was faulty. However, after discovering my other RPi's have the exact same problem I came to the conclusion it was software related.

They all use the original Raspberry Pi 2.5A power supply.

1) has a SanDisk sdhc card 2) has a OEM brand sdhc card 3) has a official RPi NOOBS sdhc card

All are connected to an Ethernet cable, no USB devices and Wi-Fi, I did some tests with HDMI cable connected to monitor and some tests without HDMI cable connected which resulted in the same outcome.

Test case / steps to reproduce: -Install Raspbian Stretch Lite - Version: June 2018 (Release date 2018-06-27) on SD card using Etcher on MacOS (I suppose this last part is irrelevant) -Place SSH file in boot partition to enable SSH server upon first boot -Power on RPi -SSH to RPi & sudo reboot now. I repeated this 10 times without any problems. -sudo apt-get update && sudo apt-get upgrade -sudo reboot now: RPi restarts without any problems -when I do some more reboots the RPi will show on approximately 1/3 of the reboots a screen with an end line of kernel panic, see: https://imgur.com/a/6reDp3e

If I can do anything to help you guys with the debugging please let me know!

APT log files history.log term.log

Bootpanic commented 6 years ago

I'm experiencing exactly the same problem with 2 RPi 3B+'s, oops/panic on shutdown/reboot. Step to reproduce the issue are exactly the same as mentioned by mutability and tommeh1337 I'm going to try the rollback to 4.14.62 as mentioned by Handyone.

Bootpanic commented 6 years ago

Rolling back to version 4.14.62 seems to have fixed the issue. 100+ reboots without any problem.

pelwell commented 6 years ago

I can reproduce this now - either disabling BT or the loglevel I had set was masking the issue.

tommeh1337 commented 6 years ago

I did a downgrade to kernel 4.14.62 via this command: sudo rpi-update 911147a3276beee09afc4237e1b7b964e61fb88a

Did a lot of reboots without any problems. So this kernel seems stable.

How can I switch to the newest 'stock' kernel (which is maintained via apt) when this bug is fixed?

Bootpanic commented 6 years ago

How can I switch to the newest 'stock' kernel (which is maintained via apt) when this bug is fixed?

sudo apt-get install --reinstall raspberrypi-bootloader raspberrypi-kernel

tommeh1337 commented 6 years ago

sudo apt-get install --reinstall raspberrypi-bootloader raspberrypi-kernel

Thanks!

edit: will my downgraded kernel be safe when I run apt-get update && apt-get upgrade? If I do apt-get update it doesn't show any packages ready for upgrading. So this seems to be the case.

Bootpanic commented 6 years ago

edit: will my downgraded kernel be safe when I run apt-get update && apt-get upgrade? If I do apt-get update it doesn't show any packages ready for upgrading. So this seems to be the case.

I'll guess apt thinks it has the latest version installed as long there is no newer version released.

The following command will place the packages on 'hold' and they will not be upgraded until you 'unhold' them sudo apt-mark hold raspberrypi-bootloader raspberrypi-kernel

Afterwards you can use the following to 'unhold' the packages when fixed version is released in the repositories. sudo apt-mark unhold raspberrypi-bootloader raspberrypi-kernel

pelwell commented 6 years ago

I may have found the problematic commit by a process of trial and error - I don't have an explanation yet, and it could be that the commit itself is correct but is in some way provoking an error elsewhere.

The failures are too infrequent for any great degree of confidence after only half an hour's testing, so I'd like to crowd-source some more extensive tests. For people comfortable with building their own kernels, try reverting https://github.com/raspberrypi/linux/commit/f6ec33f6bd3723a8146768106434ef6ab3d9d990, rebuilding and starting your reboot loops.

catschulze commented 6 years ago

@pelwell: I just found something similar, maybe a further commit (that has already hit 4.18 mainline, but not 4.14 yet) is missing for f6ec33f: (1) The following commit appeared in the Raspberry Pi kernel just when the issues started (mid August): https://github.com/raspberrypi/linux/commit/4a53c4e84ace1bc75157a7281af3fe8f5b19d08c (upstream: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/bluetooth/hci_serdev.c?h=v4.14.70&id=4a53c4e84ace1bc75157a7281af3fe8f5b19d08c) (2) The following hit 4.18 upstream just a week ago, but did not yet appear as a backport for 4.14: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/bluetooth/hci_ldisc.c?id=e6a57d22f787e73635ce0d29eef0abb77928b3e9 - it seems to be the required "symmetric counterpart" to (1) (init vs. teardown) Maybe it would make sense to try whether just picking (2) also fixes the problem (instead of reverting f6ec33f as mentioned above - also, please note that just reverting f6ec33f might not be sufficient / consistent, as the patch to hci_serdev.c from (1) that came with f6ec33f might also need to be reverted then, therefore I would prefer merging (2) as a cleaner solution, in case it helps).

pelwell commented 6 years ago

Thanks. Option 2 is clearly preferable provided it works, and so far the signs are good.

It's unfortunate that the patch doesn't include a Fixes tag, otherwise it may already have been back-ported.

pelwell commented 6 years ago

rpi-4.14.y now includes a back-port of the missing upstream commit.

popcornmix commented 6 years ago

Latest rpi-update kernel contains the missing upstream commit. Please test and report if it solves the issue.

tommeh1337 commented 6 years ago

I installed the latest 4.14.70-v7+ kernel via rpi-update on 1 of my RPi 3B+'s and just did 10 reboots without any problems. Problem seems to be fixed in this kernel!

edit: same for my second RPi 3B+

sfx2000 commented 6 years ago

Looks good here on Pi3B+ and Pi Zero W

@pelwell - Phil, thanks for the debug effort - bugs like this can be gnarly as heck to troubleshoot.

pelwell commented 6 years ago

Don't forget @catschultze who helped us arrive at the correct solution much quicker.

pelwell commented 6 years ago

Don't forget @catschulze who helped us arrive at the correct solution much quicker.

Bootpanic commented 6 years ago

No problems anymore on my 3 RPi 3B+'s Thanks for the great work @catschulze and @pelwell !