open-power-host-os / linux

Linux kernel source tree
Other
3 stars 4 forks source link

Latest devel build update +reboot crashed host #18

Closed sathnaga closed 6 years ago

sathnaga commented 7 years ago
Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=160569 Action: `yum update + reboot` https://ltc-jenkins.aus.stglabs.ibm.com/job/HostOS_CI/842/consoleText ``` Stopping Replay Read-Ahead Data... [ OK ] Reached target Shutdown. [119099.239708] Unable to handle kernel paging request for data at address 0x00000010 [119099.239794] Faulting instruction address: 0xd00000000730064c cpu 0x0: Vector: 300 (Data Access) at [c0000007f86077d0] pc: d00000000730064c: bm_evict_inode+0x2c/0x80 [binfmt_misc] lr: c00000000039003c: evict+0xfc/0x260 sp: c0000007f8607a50 msr: 900000010280b033 dar: 10 dsisr: 40000000 current = 0xc0000007f8580080 paca = 0xc00000000fd60000 softe: 0 irq_happened: 0x01 pid = 1, comm = systemd Linux version 4.14.0-1.rc4.dev.gitb27fc5c.el7.centos.ppc64le (mockbuild@host-os-jenkins-slave03.aus.stglabs.ibm.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-17) (GCC)) #1 SMP Fri Oct 20 22:55:44 -02 2017 enter ? for help [c0000007f8607a80] c00000000039003c evict+0xfc/0x260 [c0000007f8607ac0] c000000000389258 dentry_unlink_inode+0x148/0x1c0 [c0000007f8607af0] c00000000038ad58 __dentry_kill+0xe8/0x2a0 [c0000007f8607b30] c00000000038b634 shrink_dentry_list+0x1e4/0x4e0 [c0000007f8607ba0] c00000000038bb84 shrink_dcache_parent+0x54/0xb0 [c0000007f8607c00] c00000000038bc08 do_one_tree+0x28/0x60 [c0000007f8607c30] c00000000038ce4c shrink_dcache_for_umount+0x4c/0xc0 [c0000007f8607ca0] c00000000036a92c generic_shutdown_super+0x3c/0x190 [c0000007f8607d10] c00000000036af08 kill_litter_super+0x48/0x70 [c0000007f8607d40] c00000000036b45c deactivate_locked_super+0xac/0xf0 [c0000007f8607d70] c000000000397f94 cleanup_mnt+0x64/0xb0 [c0000007f8607da0] c0000000001287c0 task_work_run+0x140/0x1a0 [c0000007f8607e00] c00000000001ca70 do_notify_resume+0xf0/0x100 [c0000007f8607e30] c00000000000bec4 ret_from_except_lite+0x70/0x74 --- Exception: c00 (System Call) at 00007fff8c6a50a8 SP (7fffee70e770) is in userspace ```
sathnaga commented 6 years ago
  1. Tried to recreate with multiple reboots, unable to hit the issue
  2. Tried running host stress and fs tests, hit with a different host crash bug, reported @https://github.com/open-power-host-os/linux/issues/20
cdeadmin commented 6 years ago

While trying to reproduce with host stress, I hit with the below host crash during xfs stress tests

enter ? for help
[link register   ] c0000000002543f0 irq_work_run+0x30/0x50
[c000000ffff53cc0] c000000ffff53cf0 (unreliable)
[c000000ffff53cf0] c0000000001b7ca0 flush_smp_call_function_queue+0xf0/0x200
[c000000ffff53d70] c0000000000477ec smp_ipi_demux_relaxed+0x9c/0x110
[c000000ffff53db0] c0000000000903d4 icp_native_ipi_action+0x64/0x80
[c000000ffff53dd0] c000000000179420 __handle_irq_event_percpu+0x90/0x2d0
[c000000ffff53e90] c000000000179698 handle_irq_event_percpu+0x38/0x90
[c000000ffff53ed0] c00000000017fcf4 handle_percpu_irq+0x84/0xd0
[c000000ffff53f00] c000000000177b7c generic_handle_irq+0x4c/0x80
[c000000ffff53f20] c0000000000165d4 __do_irq+0x94/0x200
[c000000ffff53f90] c000000000029fa4 call_do_irq+0x14/0x24
[c0000007f87f3a50] c0000000000167dc do_IRQ+0x9c/0x110
[c0000007f87f3aa0] c000000000008c58 hardware_interrupt_common+0x158/0x160
--- Exception: 501 (Hardware Interrupt) at c0000000008eb664 snooze_loop+0xa4/0x190
[c0000007f87f3d90] c0000007f87f3dc0 (unreliable)
[c0000007f87f3dd0] c0000000008e83a4 cpuidle_enter_state+0xc4/0x3d0
[c0000007f87f3e30] c00000000015f73c call_cpuidle+0x4c/0x80
[c0000007f87f3e50] c00000000015fbe0 do_idle+0x2b0/0x350
[c0000007f87f3ec0] c00000000015fe8c cpu_startup_entry+0x3c/0x50
[c0000007f87f3ef0] c000000000048a74 start_secondary+0x4e4/0x530
[c0000007f87f3f90] c00000000000b16c start_secondary_prolog+0x10/0x14
b:mon>

jenkins_job_log.txt looks like this patch , https://www.spinics.net/lists/linux-fsdevel/msg117031.html fixes this issue