raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.14k stars 4.99k forks source link

mcp251xfd and kernel panic #6406

Open pompushko opened 4 weeks ago

pompushko commented 4 weeks ago

Describe the bug

Hello

I have pretty simple config with mcp2518fd SPI connected CAN-bus controllers. With super fresh build of Raspbian, I start facing kernel panic during boot. Only disconnecting of CAN-bus HAT can help.

Steps to reproduce the behaviour

Just setup and use any of CAN-bus HAT with mcp2518fd.

Device (s)

Raspberry Pi Zero 2 W

System

Raspberry Pi reference 2024-10-08 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, dbadbe2868e2fe5548024f145ea2726e9c58d71e, stage3

Aug 30 2024 19:19:11 Copyright (c) 2012 Broadcom version 2808975b80149bbfe86844655fe45c7de66fc078 (clean) (release) (start)

Linux canbus 6.6.51+rpt-rpi-v7 #1 SMP Raspbian 1:6.6.51-1+rpt2 (2024-10-01) armv7l GNU/Linux

Logs

Oct 08 21:16:31 canbus kernel: 8<--- cut here --- Oct 08 21:16:31 canbus kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 when read Oct 08 21:16:31 canbus kernel: [00000000] pgd=03ac1835, pte=00000000, *ppte=00000000 Oct 08 21:16:31 canbus kernel: Internal error: Oops: 17 [#1] SMP ARM Oct 08 21:16:31 canbus kernel: Modules linked in: brcmfmac_wcc brcmfmac vc4 snd_soc_hdmi_codec drm_display_helper brcmutil cec hci_uart drm_dma_helper btbcm drm_kms_helper bluetooth cfg80211 snd_soc_core snd_compress raspberrypi_hwmon snd_pcm_dmaengine binfmt_misc bcm2835_codec(C) mcp251xfd bcm2835_v4l2(C) ecdh_generic ecc can_dev bcm2835_isp(C) v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_vmalloc rfkill videobuf2_dma_contig snd_bcm2835(C) videobuf2_memops videobuf2_v4l2 snd_pcm videodev snd_timer raspberrypi_gpiomem snd videobuf2_common vc_sm_cma(C) mc uio_pdrv_genirq uio can_gw can drm fuse drm_panel_orientation_quirks dm_mod backlight ip_tables x_tables ipv6 spidev i2c_bcm2835 spi_bcm2835aux spi_bcm2835 fixed Oct 08 21:16:31 canbus kernel: CPU: 0 PID: 479 Comm: ip Tainted: G C 6.6.51+rpt-rpi-v7 #1 Raspbian 1:6.6.51-1+rpt2 Oct 08 21:16:31 canbus kernel: Hardware name: BCM2835 Oct 08 21:16:31 canbus kernel: PC is at timecounter_read+0x14/0xac Oct 08 21:16:31 canbus kernel: LR is at mcp251xfd_ring_init+0x1f0/0x738 [mcp251xfd] Oct 08 21:16:31 canbus kernel: pc : [<801c1418>] lr : [<7f2be43c>] psr: 20000013 Oct 08 21:16:31 canbus kernel: sp : 9cbe9998 ip : ffffc000 fp : 00000000 Oct 08 21:16:31 canbus kernel: r10: 00000001 r9 : 00000460 r8 : 00000460 Oct 08 21:16:31 canbus kernel: r7 : 00000000 r6 : 8377a600 r5 : 00000000 r4 : 8377a970 Oct 08 21:16:31 canbus kernel: r3 : 83779000 r2 : 8377a970 r1 : 00000004 r0 : 00000000 Oct 08 21:16:31 canbus kernel: Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Oct 08 21:16:31 canbus kernel: Control: 10c5383d Table: 1ba8406a DAC: 00000055 Oct 08 21:16:31 canbus kernel: Register r0 information: NULL pointer Oct 08 21:16:31 canbus kernel: Register r1 information: non-paged memory Oct 08 21:16:31 canbus kernel: Register r2 information: non-slab/vmalloc memory Oct 08 21:16:31 canbus kernel: Register r3 information: non-slab/vmalloc memory Oct 08 21:16:31 canbus kernel: Register r4 information: non-slab/vmalloc memory Oct 08 21:16:31 canbus kernel: Register r5 information: NULL pointer Oct 08 21:16:31 canbus kernel: Register r6 information: non-slab/vmalloc memory Oct 08 21:16:31 canbus kernel: Register r7 information: NULL pointer Oct 08 21:16:31 canbus kernel: Register r8 information: non-paged memory Oct 08 21:16:31 canbus kernel: Register r9 information: non-paged memory Oct 08 21:16:31 canbus kernel: Register r10 information: non-paged memory Oct 08 21:16:32 canbus kernel: Register r11 information: NULL pointer Oct 08 21:16:32 canbus kernel: Register r12 information: non-paged memory Oct 08 21:16:32 canbus kernel: Process ip (pid: 479, stack limit = 0xefae9b57) Oct 08 21:16:32 canbus kernel: Stack: (0x9cbe9998 to 0x9cbea000) Oct 08 21:16:32 canbus kernel: 9980: 8160d000 00000000 Oct 08 21:16:32 canbus kernel: 99a0: 8377a600 7f2be43c 83779000 7f2bbfd8 83778600 81555800 8377a970 ffffc000 Oct 08 21:16:32 canbus kernel: 99c0: 00000000 00000000 83778600 81555800 816f3c00 8377a000 8377818c 00000001 Oct 08 21:16:32 canbus kernel: 99e0: 00000000 7f2bbfe8 00000000 00000000 00000000 816f3c00 00000000 83778600 Oct 08 21:16:32 canbus kernel: 9a00: 83778000 7f2bc13c 7f2fa0b4 00040080 83778000 9cbe9cbc 8244d8c0 7f2fa0b4 Oct 08 21:16:32 canbus kernel: 9a20: 83778024 8377818c 00000001 8096b584 834fd100 834fd11c 9cbe9b84 83778000 Oct 08 21:16:32 canbus kernel: 9a40: 9cbe9cbc fd4725ac 83778000 00040081 8244d8c0 9cbe9cbc 00040080 8096ba2c Oct 08 21:16:32 canbus kernel: 9a60: 01ac7000 01ae7fff 9b95940c 76ec4000 ffffffff fd4725ac 83778000 83d17c00 Oct 08 21:16:32 canbus kernel: 9a80: 00000000 00040080 7f2fa0b4 8210aad0 81ecfe40 8096bac4 00000000 83fee6c0 Oct 08 21:16:32 canbus kernel: 9aa0: 01ac7000 83778000 83d17c00 00000000 9cbe9cbc 8097c6e4 00000002 83fee6c0 Oct 08 21:16:32 canbus kernel: 9ac0: 01ac7000 01ae7fff 9b95940c 76ec4000 ffffffff 9b959500 00000002 00000000 Oct 08 21:16:32 canbus kernel: 9ae0: 00000000 00000000 00000009 00000000 00000000 80c7f6ac 9b959b04 9b959b0c Oct 08 21:16:32 canbus kernel: 9b00: 9cbe9cbc 80732378 9b959000 80b3620c 8244d8c0 80b3620c 81401380 00000cc0 Oct 08 21:16:32 canbus kernel: 9b20: 816cb240 80363208 00000000 00000000 00000000 00000000 a0000013 ffffffff Oct 08 21:16:32 canbus kernel: 9b40: 00000000 81401380 00000000 00000284 80982410 00000cc0 9cbe9cbc fd4725ac Oct 08 21:16:32 canbus kernel: 9b60: 816cb240 00000000 00000000 00000000 00000000 fd4725ac 9cbe9cbc 00000000 Oct 08 21:16:32 canbus kernel: 9b80: 81235860 83d17c00 83778000 8210aac0 9cbe9cbc 00000000 00000000 809828f4 Oct 08 21:16:32 canbus kernel: 9ba0: 83d17c00 00000000 9cbe9cbc 01ae7fff 00000001 81303640 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9bc0: 8210aad0 00000001 00000000 81ecfe40 00000001 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9be0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9c00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9c20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 fd4725ac Oct 08 21:16:32 canbus kernel: 9c40: 80ec3458 8210aac0 00000000 81304300 81304300 81ecfe40 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9c60: 816cb240 8097b13c 00000000 9cbe9cbc 00000000 81555bf8 01ac7190 8068d938 Oct 08 21:16:32 canbus kernel: 9c80: 83fee680 8068d938 00000000 fd4725ac 8b3f2b2c 81ecfe40 8097aef0 8210aac0 Oct 08 21:16:32 canbus kernel: 9ca0: 00000020 00000000 83d1757c 00000000 00000000 809d8e34 000003f8 00000000 Oct 08 21:16:32 canbus kernel: 9cc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9ce0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9d00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9d20: 00000000 00000000 00000000 00000000 00000000 fd4725ac 81ecfe40 81513800 Oct 08 21:16:32 canbus kernel: 9d40: 00000020 83d17400 81ecfe40 809d8334 7fffffff fd4725ac 81ecfe40 9cbe9f38 Oct 08 21:16:32 canbus kernel: 9d60: 81ecfe40 00000020 00000000 00000020 83d17400 809d8608 000003f8 00000000 Oct 08 21:16:32 canbus kernel: 9d80: 00000081 00140cca 00140cca 00000008 83238ac0 00000000 000001df 00000000 Oct 08 21:16:32 canbus kernel: 9da0: 00000000 00000000 00000000 fd4725ac 9cbe9e40 00000000 9cbe9f38 8417f980 Oct 08 21:16:32 canbus kernel: 9dc0: 00000000 9cbe9dec 9cbe9dec 00000000 00000000 8093b3a8 9cbe9f38 8417f980 Oct 08 21:16:32 canbus kernel: 9de0: 00000000 8093be84 9cbe9e40 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9e00: 00000000 00000000 00000000 fd4725ac 7eda66c4 00000000 9cbe9f38 8417f980 Oct 08 21:16:32 canbus kernel: 9e20: 00000000 00000000 9cbe9e44 9cbe9e84 00000000 8093ddfc 00000000 ffffffff Oct 08 21:16:32 canbus kernel: 9e40: 00000000 7eda66e4 00000020 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9e60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9e80: 00000000 00000010 00000000 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9ea0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9ec0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9ee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9f00: 00000000 fd4725ac 00000000 8417f980 7eda6670 00000000 80100298 8244d8c0 Oct 08 21:16:32 canbus kernel: 9f20: 00000128 8093e300 00000000 00000000 00000000 fffffff7 9cbe9e84 0000000c Oct 08 21:16:32 canbus kernel: 9f40: 00000000 00000000 01000005 00000001 00000020 7eda66e4 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9f60: 00000001 00000000 00000000 00000001 00000000 00000000 00000000 00000000 Oct 08 21:16:32 canbus kernel: 9f80: 00000000 00000000 8244d8c0 fd4725ac 7eda6e3f 0009cca0 00000000 00000002 Oct 08 21:16:32 canbus kernel: 9fa0: 00000128 80100288 0009cca0 00000000 00000003 7eda6670 00000000 00000001 Oct 08 21:16:32 canbus kernel: 9fc0: 0009cca0 00000000 00000002 00000128 67059320 00083954 0009cca0 00000000 Oct 08 21:16:32 canbus kernel: 9fe0: 0009cd68 7eda6618 0007f4dc 76e0412c 20000010 00000003 00000000 00000000 Oct 08 21:16:32 canbus kernel: timecounter_read from mcp251xfd_ring_init+0x1f0/0x738 [mcp251xfd] Oct 08 21:16:32 canbus kernel: mcp251xfd_ring_init [mcp251xfd] from mcp251xfd_chip_start+0x244/0x2a8 [mcp251xfd] Oct 08 21:16:32 canbus kernel: mcp251xfd_chip_start [mcp251xfd] from mcp251xfd_open+0x80/0x210 [mcp251xfd] Oct 08 21:16:32 canbus kernel: mcp251xfd_open [mcp251xfd] from dev_open+0x114/0x1c8 Oct 08 21:16:32 canbus kernel: __dev_open from dev_change_flags+0x194/0x20c Oct 08 21:16:32 canbus kernel: dev_change_flags from dev_change_flags+0x20/0x5c Oct 08 21:16:32 canbus kernel: dev_change_flags from do_setlink+0x35c/0x1020 Oct 08 21:16:32 canbus kernel: do_setlink from rtnl_newlink+0x530/0x950 Oct 08 21:16:32 canbus kernel: rtnl_newlink from rtnetlink_rcv_msg+0x24c/0x2f8 Oct 08 21:16:32 canbus kernel: rtnetlink_rcv_msg from netlink_rcv_skb+0xc0/0x120 Oct 08 21:16:32 canbus kernel: netlink_rcv_skb from netlink_unicast+0x194/0x270 Oct 08 21:16:32 canbus kernel: netlink_unicast from netlink_sendmsg+0x1f8/0x480 Oct 08 21:16:32 canbus kernel: netlink_sendmsg from sock_sendmsg+0x44/0x78 Oct 08 21:16:32 canbus kernel: sock_sendmsg from __sys_sendmsg+0x1f4/0x21c Oct 08 21:16:32 canbus kernel: sys_sendmsg from _sys_sendmsg+0x9c/0xd0 Oct 08 21:16:32 canbus kernel: ___sys_sendmsg from sys_sendmsg+0x78/0xbc Oct 08 21:16:32 canbus kernel: sys_sendmsg from sys_trace_return+0x0/0x10 Oct 08 21:16:32 canbus kernel: Exception stack(0x9cbe9fa8 to 0x9cbe9ff0) Oct 08 21:16:32 canbus kernel: 9fa0: 0009cca0 00000000 00000003 7eda6670 00000000 00000001 Oct 08 21:16:32 canbus kernel: 9fc0: 0009cca0 00000000 00000002 00000128 67059320 00083954 0009cca0 00000000 Oct 08 21:16:32 canbus kernel: 9fe0: 0009cd68 7eda6618 0007f4dc 76e0412c Oct 08 21:16:32 canbus kernel: Code: e52de004 e28dd004 e1a04000 e5900000 (e5903000) Oct 08 21:16:32 canbus kernel: ---[ end trace 0000000000000000 ]---

Additional context

All fine on 6.6.31+rpt-rpi-v7

pelwell commented 4 weeks ago

From the crash dump I'd say the timecounter priv->tc's cc field was NULL at the time when timecounter_read was called from mcp251xfd_ring_init. It looks as though it should already have been initialised by the call to timecounter_init from mcp251xfd_timestamp_start, but that may not be the case.

@marckleinebudde It looks like a group of patches from you were merged into this time interval (6.6.31 to 6.6.51) - do you have any ideas what may be going wrong?

pompushko commented 4 weeks ago

https://nvd.nist.gov/vuln/detail/CVE-2024-41088

That's could be an issue?

pelwell commented 4 weeks ago

To me, the CVE doesn't seem to fit the symptoms you are seeing, but I'm not going to rule out the possibility.

marckleinebudde commented 3 weeks ago

@marckleinebudde It looks like a group of patches from you were merged into this time interval (6.6.31 to 6.6.51) - do you have any ideas what may be going wrong?

Known problem, see https://lore.kernel.org/all/20240924-truthful-authentic-basilisk-aaab90-mkl@pengutronix.de/

Please upgrade to v6.6.53 or cherry pick: 51b2a7216122 ("can: mcp251xfd: properly indent labels") a7801540f325 ("can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()")

pelwell commented 3 weeks ago

Thanks, Marc. The timeline on 6.6 is surprising, but I guess the lack of a Fixes: tag meant it slipped through the back-ports net.

@pompushko The fixes are already in the latest rpi-update build - sudo rpi-update. Note that this is beta software and should be treated with caution - back up valuable data just in case.

pompushko commented 3 weeks ago

Thanks, Marc. The timeline on 6.6 is surprising, but I guess the lack of a Fixes: tag meant it slipped through the back-ports net.

@pompushko The fixes are already in the latest rpi-update build - sudo rpi-update. Note that this is beta software and should be treated with caution - back up valuable data just in case.

Thank you for fast bugfix!

Fix in fresh version of kernel? Or how it looks like?

marckleinebudde commented 3 weeks ago

I mainlined the series https://lore.kernel.org/all/20240628-mcp251xfd-workaround-erratum-6-v4-0-53586f168524@pengutronix.de via can-next and didn't include Fixes Tags nor put stable on Cc, as the changes were rather intrusive and the bug triggers very rarely in normal use cases.

The stable team (in person or an algorithm) picked some patches from that series but not all. Cherry picking the above mentioned patches fixes the problem.

pelwell commented 3 weeks ago

Thank you for fast bugfix!

The fix was already in the latest build before we were aware of the problem.

Fix in fresh version of kernel? Or how it looks like?

I'm not sure I understand the question, but the change affects the mcp251xfd module, and is part of a complete 6.6.54 kernel.

pompushko commented 3 weeks ago

I mean, how sudo rpi-update will fix kernel :D

pelwell commented 3 weeks ago

rpi-update installs complete kernel (and firmware) builds. The latest kernel build is 6.6.54, which includes the fix.

pompushko commented 3 weeks ago

Oh, now I get it :D I thought that with new fixes, should be incremented version of kernel...

pelwell commented 3 weeks ago

At some point in the future the standard kernel package in "apt" will also pick up the fix, but that will take longer.

pompushko commented 3 weeks ago

At some point in the future the standard kernel package in "apt" will also pick up the fix, but that will take longer.

Is there any way to check, if apt "take" this bugfix?

pelwell commented 3 weeks ago

Actually, I think it should be there already. Both the most recent releases to the "stable" branch are 6.6.53, and so will include the fix. Try:

sudo apt-update
sudo apt-upgrade
popcornmix commented 3 weeks ago

Actually, I think it should be there already. Both the most recent releases to the "stable" branch are 6.6.53, and so will include the fix. Try:

Latest "stable" rpi-update kernel is:

Linux version 6.6.51+ (dom@buildbot) (arm-linux-gnueabihf-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1802 Tue Oct  8 17:23:04 BST 2024

Current apt kernel is marginally older:

Linux version 6.6.51+rpt-rpi-v8 (serge@raspberrypi.com) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT Debian 1:6.6.51-1+rpt2 (2024-10-01)

and will be bumped to stable version imminently. However both are 6.6.51.

Latest master rpi-update is:

Linux version 6.6.54+ (dom@buildbot) (arm-linux-gnueabihf-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1801 Fri Oct  4 16:49:31 BST 2024

so should have the version required.

pompushko commented 3 weeks ago

Actually, I think it should be there already. Both the most recent releases to the "stable" branch are 6.6.53, and so will include the fix. Try:

sudo apt-update
sudo apt-upgrade

unfortunately, still no any updates

pelwell commented 3 weeks ago

unfortunately, still no any updates

Yes - @popcornmix was explaining that I'd made a mistake.

marckleinebudde commented 3 weeks ago

FYI: There is one unsolved problem in one of the patches. In the error case the driver will bail out with IRQ handler mcp251xfd_handle_tefif() returned -22.. See following discussion for more details:

https://lore.kernel.org/all/FR3P281MB155216711EFF900AD9791B7ED9692@FR3P281MB1552.DEUP281.PROD.OUTLOOK.COM/ https://lore.kernel.org/all/20241001-mcp251xfd-fix-length-calculation-v1-1-598b46508d61@pengutronix.de/

pelwell commented 3 weeks ago

And then there's #6407.

pompushko commented 3 weeks ago

Hour ago came this one packages:

The following packages will be upgraded:
  linux-headers-6.6.51+rpt-common-rpi linux-headers-6.6.51+rpt-rpi-v7
  linux-headers-6.6.51+rpt-rpi-v7l linux-headers-rpi-v7 linux-headers-rpi-v7l
  linux-image-6.6.51+rpt-rpi-v7 linux-image-6.6.51+rpt-rpi-v7l linux-image-rpi-v7
  linux-image-rpi-v7l linux-kbuild-6.6.51+rpt linux-libc-dev

After update, I get: Linux canbus 6.6.51+rpt-rpi-v7 #1 SMP Raspbian 1:6.6.51-1+rpt3 (2024-10-08) armv7l GNU/Linux

But still have the issue :(

pelwell commented 3 weeks ago

6.6.54 has missed the cut-off for the next apt kernel release, but we'll try to get it out soon.

pompushko commented 2 weeks ago

6.6.54 has missed the cut-off for the next apt kernel release, but we'll try to get it out soon.

when it could be? thank you

ragazenta commented 1 week ago

6.6.54 has missed the cut-off for the next apt kernel release, but we'll try to get it out soon.

when it could be? thank you

Uhm, how about downgrading to 6.6.28, if you prefer apt over rpi-update? I got segmentation fault in 6.6.51 on Pi 5. The package name for Pi Zero is a bit different though.

sudo apt install linux-image-6.6.28+rpt-rpi-2712 --no-install-recommends
sudo cp /boot/vmlinuz-6.6.28+rpt-rpi-2712 /boot/firmware/kernel_2712.img