tobetter / linux

Linux kernel source tree
Other
68 stars 30 forks source link

odroid-m1: kernel panic: cpu serror #36

Open paralin opened 2 years ago

paralin commented 2 years ago

I saw this kernel panic happen one time, have not been able to reliably reproduce it:

[ 1569.468061] systemd-journald[30]: Received client request to flush runtime journal.
[ 1569.709490] rockchip-pm-domain fdd90000.power-management:power-controller: failed to get ack on domain 'gpu', val=0x9fe
[ 1569.710500] SError Interrupt on CPU3, code 0xbe000011 -- SError
[ 1569.710513] CPU: 3 PID: 2930 Comm: Xorg Not tainted 5.18.0-rc7 #1
[ 1569.710518] Hardware name: Hardkernel ODROID-M1 (DT)
[ 1569.710521] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1569.710526] pc : _regmap_bus_reg_write+0x20/0x30
[ 1569.710540] lr : _regmap_write+0x5c/0xb0
[ 1569.710544] sp : ffff80000aea3900
[ 1569.710545] x29: ffff80000aea3900 x28: ffff00009ac7d700 x27: 0000000000000000
[ 1569.710554] x26: ffff0001008a9938 x25: ffff0001001dc080 x24: ffff0001000f2298
[ 1569.710559] x23: 0000000000000001 x22: ffff0001008a3000 x21: 0000000080000000
[ 1569.710565] x20: 0000000000000008 x19: ffff0001008a3000 x18: ffffffffffffffff
[ 1569.710570] x17: 66203a72656c6c6f x16: 72746e6f632d7265 x15: 776f703a746e656d
[ 1569.710575] x14: 6567616e616d2d72 x13: 65663978303d6c61 x12: 76202c2775706727
[ 1569.710580] x11: ffff800009eb3388 x10: ffff800009eb3388 x9 : 00000000ffffefff
[ 1569.710585] x8 : ffff800009f0b388 x7 : 0000000000017fe8 x6 : 00000000fffff000
[ 1569.710590] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800008873cc0
[ 1569.710595] x2 : 0000000080000000 x1 : ffff80000a293008 x0 : 0000000000000000
[ 1569.710601] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 1569.710604] CPU: 3 PID: 2930 Comm: Xorg Not tainted 5.18.0-rc7 #1
[ 1569.710607] Hardware name: Hardkernel ODROID-M1 (DT)
[ 1569.710609] Call trace:
[ 1569.710612]  dump_backtrace.part.0+0xc8/0xe0
[ 1569.710621]  show_stack+0x18/0x70
[ 1569.710625]  dump_stack_lvl+0x68/0x84
[ 1569.710632]  dump_stack+0x18/0x34
[ 1569.710636]  panic+0x168/0x328
[ 1569.710639]  nmi_panic+0x88/0x90
[ 1569.710643]  arm64_serror_panic+0x6c/0x80
[ 1569.710647]  arm64_is_fatal_ras_serror+0x84/0x90
[ 1569.710650]  do_serror+0x34/0x60
[ 1569.710653]  el1h_64_error_handler+0x30/0x50
[ 1569.710659]  el1h_64_error+0x64/0x68
[ 1569.710662]  _regmap_bus_reg_write+0x20/0x30
[ 1569.710667]  regmap_write+0x4c/0x80
[ 1569.710671]  rockchip_pd_power+0x220/0x2d0
[ 1569.710677]  rockchip_pd_power_on+0x14/0x20
[ 1569.710681]  _genpd_power_on+0xc0/0x140
[ 1569.710685]  genpd_power_on.part.0+0xa4/0x1f0
[ 1569.710689]  genpd_runtime_resume+0xe4/0x280
[ 1569.710693]  __rpm_callback+0x48/0x170
[ 1569.710698]  rpm_callback+0x6c/0x80
[ 1569.710702]  rpm_resume+0x364/0x5e0
[ 1569.710706]  __pm_runtime_resume+0x4c/0x80
[ 1569.710710]  panfrost_perfcnt_close+0x34/0xa0 [panfrost]
[ 1569.710730]  panfrost_postclose+0x1c/0x50 [panfrost]
[ 1569.710739]  drm_file_free.part.0+0x1a4/0x290 [drm]
[ 1569.710853]  drm_close_helper.isra.0+0x5c/0x70 [drm]
[ 1569.710949]  drm_release+0x68/0x110 [drm]
[ 1569.711044]  __fput+0x70/0x230
[ 1569.711050]  ____fput+0x10/0x20
[ 1569.711053]  task_work_run+0x80/0x180
[ 1569.711059]  do_notify_resume+0x1ec/0x1120
[ 1569.711066]  el0_svc+0x9c/0xb0
[ 1569.711073]  el0t_64_sync_handler+0xa4/0x130
[ 1569.711077]  el0t_64_sync+0x18c/0x190
[ 1569.711084] SMP: stopping secondary CPUs
[ 1569.711097] Kernel Offset: disabled
[ 1569.711099] CPU features: 0x100,0000100d,19801c86
[ 1569.711103] Memory Limit: 4096 MB
[ 1569.735186] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

Figured this might be worth reporting.

Linux m1 5.18.0-rc7 #1 SMP PREEMPT Thu May 19 23:15:47 PDT 2022 aarch64 GNU/Linux

Kernel odroid-5.18.y commit 035eaa679f2df40eb30b8ebd02b781d69f10bb7f

Vebryn commented 2 years ago

same here, error appear at boot with kernel 5.18.0-202205181718~jammy

I reconstruct boot partition using flash-kernel and system boot normally. So strange, was boot partition corrupted ?

paralin commented 2 years ago

Happening again with a completely brand new SD card:

[   73.812101] rockchip-pm-domain fdd90000.power-management:power-controller: failed to get ack on domain 'gpu', val=0x1fe
[   73.813091] SError Interrupt on CPU1, code 0xbe000011 -- SError
[   73.813104] CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 5.18.12 #1
[   73.813110] Hardware name: Hardkernel ODROID-M1 (DT)
[   73.813113] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   73.813118] pc : _raw_spin_unlock_irqrestore+0x18/0x50
[   73.813130] lr : regmap_unlock_spinlock+0x14/0x20
[   73.813138] sp : ffff800009ce3720
[   73.813139] x29: ffff800009ce3720 x28: 0000000000000013 x27: 0000000000000100
[   73.813148] x26: ffff800000e04440 x25: ffff0001001ed080 x24: ffff0001000f1898
[   73.813154] x23: 0000000000000001 x22: 0000000000000000 x21: 0000000080000000
[   73.813159] x20: 0000000000000000 x19: ffff0001008a2c00 x18: ffffffffffffffff
[   73.813164] x17: 66203a72656c6c6f x16: 72746e6f632d7265 x15: 0720072007200720
[   73.813170] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
[   73.813175] x11: ffff8000094d8250 x10: ffff8000094d8250 x9 : 00000000ffffefff
[   73.813181] x8 : ffff800009530250 x7 : 0000000000017fe8 x6 : 00000000fffff000
[   73.813186] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800008780810
[   73.813191] x2 : 0000000000000000 x1 : ffff000010928ec0 x0 : 0000000100000001
[   73.813198] Kernel panic - not syncing: Asynchronous SError Interrupt
[   73.813201] CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 5.18.12 #1
[   73.813205] Hardware name: Hardkernel ODROID-M1 (DT)
[   73.813207] Call trace:
[   73.813208]  dump_backtrace+0xb0/0x120
[   73.813216]  show_stack+0x18/0x70
[   73.813221]  dump_stack_lvl+0x68/0x84
[   73.813226]  dump_stack+0x18/0x34
[   73.813230]  panic+0x168/0x328
[   73.813233]  nmi_panic+0x88/0x90
[   73.813237]  arm64_serror_panic+0x6c/0x80
[   73.813241]  arm64_is_fatal_ras_serror+0x84/0x90
[   73.813245]  do_serror+0x34/0x60
[   73.813248]  el1h_64_error_handler+0x30/0x50
[   73.813253]  el1h_64_error+0x64/0x68
[   73.813256]  _raw_spin_unlock_irqrestore+0x18/0x50
[   73.813259]  regmap_write+0x58/0x80
[   73.813263]  rockchip_pd_power+0x220/0x2d0
[   73.813270]  rockchip_pd_power_on+0x14/0x20
[   73.813274]  _genpd_power_on+0xc0/0x170
[   73.813278]  genpd_power_on.part.0+0xa4/0x1f0
[   73.813283]  __genpd_dev_pm_attach+0x100/0x2b0
[   73.813287]  genpd_dev_pm_attach+0x60/0x70
[   73.813291]  dev_pm_domain_attach+0x24/0x40
[   73.813297]  platform_probe+0x50/0xe0
[   73.813302]  really_probe+0x17c/0x3d0
[   73.813308]  __driver_probe_device+0x114/0x190
[   73.813312]  driver_probe_device+0x3c/0xf0
[   73.813316]  __driver_attach+0xcc/0x1e0
[   73.813321]  bus_for_each_dev+0x70/0xd0
[   73.813325]  driver_attach+0x24/0x30
[   73.813329]  bus_add_driver+0x144/0x230
[   73.813333]  driver_register+0x78/0x130
[   73.813337]  __platform_driver_register+0x28/0x40
[   73.813340]  panfrost_driver_init+0x20/0x1000 [panfrost]
[   73.813369]  do_one_initcall+0x50/0x1c0
[   73.813373]  do_init_module+0x44/0x240
[   73.813380]  load_module+0x2078/0x2930
[   73.813384]  __do_sys_finit_module+0xac/0x130
[   73.813388]  __arm64_sys_finit_module+0x24/0x30
[   73.813392]  invoke_syscall+0x48/0x120
[   73.813396]  el0_svc_common.constprop.0+0xd4/0x100
[   73.813400]  do_el0_svc+0x28/0x90
[   73.813404]  el0_svc+0x34/0xb0
[   73.813408]  el0t_64_sync_handler+0xa4/0x130
[   73.813412]  el0t_64_sync+0x18c/0x190
[   73.813418] SMP: stopping secondary CPUs
[   73.813431] Kernel Offset: disabled
[   73.813433] CPU features: 0x100,0000100d,19801c86
[   73.813436] Memory Limit: 4096 MB
[   73.840551] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---