open-power-host-os / linux

Linux kernel source tree
Other
3 stars 4 forks source link

Power8: Hit with "Oops: Kernel access of bad area, sig: 11" on latest nightly #14

Closed sathnaga closed 7 years ago

sathnaga commented 7 years ago

Kernel Version: 4.13.0-3.rc3.dev.gitec0d270.el7.centos.ppc64le Hit few mins after a fresh boot, tried to run avocado tests(just started). Most of(sosreport, service restart, etc) command gets stuck after the crash.

[  909.585268] list_del corruption. prev->next should be c000000f23120760, but was c000000f23121760
[  909.585448] ------------[ cut here ]------------
[  909.585547] WARNING: CPU: 64 PID: 14123 at lib/list_debug.c:53 __list_del_entry_valid+0xd0/0x100
[  909.585705] Modules linked in: vhost_net vhost tap act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables ses enclosure scsi_transport_sas i2c_opal i2c_core powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc kvm_hv kvm_pr kvm xfs libcrc32c tg3 ptp pps_core
[  909.586812] CPU: 64 PID: 14123 Comm: qemu-system-ppc Not tainted 4.13.0-3.rc3.dev.gitec0d270.el7.centos.ppc64le #1
[  909.586963] task: c000000f0c9cc600 task.stack: c000000f061a8000
[  909.587026] NIP: c0000000005a0770 LR: c0000000005a076c CTR: 00000000300304d0
[  909.587100] REGS: c000000f061ab6c0 TRAP: 0700   Not tainted  (4.13.0-3.rc3.dev.gitec0d270.el7.centos.ppc64le)
[  909.587197] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>
[  909.587205]   CR: 42024422  XER: 20000000
[  909.587291] CFAR: c00000000016e9c8 SOFTE: 1 
[  909.587291] GPR00: c0000000005a076c c000000f061ab940 c000000001397a00 0000000000000054 
[  909.587291] GPR04: 0000000000000000 c000000000098244 9000000000009033 0000000000000000 
[  909.587291] GPR08: 0000000000000001 0000000000000007 0000000000000006 9000000000001003 
[  909.587291] GPR12: 0000000000004400 c00000000fda8000 0000000000000000 0000000000000000 
[  909.587291] GPR16: 0000000000000000 0000000124cb8058 0000000124cb8038 00000001250ed8b8 
[  909.587291] GPR20: 00000001250ed8b0 00000001250ed8d0 c00000000138d820 c000000000d9c238 
[  909.587291] GPR24: 0000000000000001 5deadbeef0000100 c000000f061abb80 c000000000f24840 
[  909.587291] GPR28: c0000000013cbe50 0000000000000001 c000000f231215e0 c000000f23120750 
[  909.587927] NIP [c0000000005a0770] __list_del_entry_valid+0xd0/0x100
[  909.587990] LR [c0000000005a076c] __list_del_entry_valid+0xcc/0x100
[  909.588052] Call Trace:
[  909.588079] [c000000f061ab940] [c0000000005a076c] __list_del_entry_valid+0xcc/0x100 (unreliable)
[  909.588167] [c000000f061ab9a0] [c000000000988bbc] tcf_chain_destroy+0x2c/0xa0
[  909.588243] [c000000f061ab9d0] [c000000000988c84] tcf_block_put+0x54/0x90
[  909.588308] [c000000f061aba00] [d000000014d3178c] htb_destroy_class.isra.11+0x5c/0x80 [sch_htb]
[  909.588401] [c000000f061aba30] [d000000014d318a8] htb_destroy+0xf8/0x1b0 [sch_htb]
[  909.588476] [c000000f061abab0] [c0000000009818a4] qdisc_destroy+0xe4/0x170
[  909.588539] [c000000f061abae0] [c00000000098332c] dev_shutdown+0xbc/0x100
[  909.588604] [c000000f061abb20] [c00000000093f248] rollback_registered_many+0x2f8/0x560
[  909.588679] [c000000f061abbf0] [c00000000093f520] rollback_registered+0x70/0xb0
[  909.588755] [c000000f061abc40] [c000000000941908] unregister_netdevice_queue+0x128/0x180
[  909.588832] [c000000f061abcc0] [c00000000077a6cc] __tun_detach+0x22c/0x460
[  909.588895] [c000000f061abd20] [c00000000077a938] tun_chr_close+0x38/0x60
[  909.588959] [c000000f061abd50] [c00000000035abf8] __fput+0xd8/0x280
[  909.589024] [c000000f061abdb0] [c000000000120f20] task_work_run+0x140/0x1a0
[  909.589089] [c000000f061abe00] [c00000000001d810] do_notify_resume+0xf0/0x100
[  909.589164] [c000000f061abe30] [c00000000000bf44] ret_from_except_lite+0x70/0x74
[  909.589238] Instruction dump:
[  909.589295] 4bffffd4 3c62ff9b 3863f6d0 4bbce235 60000000 0fe00000 38600000 4bffffb8 
[  909.589435] 3c62ff9b 3863f690 4bbce219 60000000 <0fe00000> 38600000 4bffff9c 3c62ff9b 
[  909.589577] ---[ end trace c2b424e83e247e4b ]---
[  909.589685] Unable to handle kernel paging request for data at address 0x00000000
[  909.589823] Faulting instruction address: 0xc000000000988b48
[  909.589939] Oops: Kernel access of bad area, sig: 11 [#1]
[  909.590030] SMP NR_CPUS=1024 
[  909.590030] NUMA 
[  909.590101] PowerNV
[  909.590197] Modules linked in: vhost_net vhost tap act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables ses enclosure scsi_transport_sas i2c_opal i2c_core powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc kvm_hv kvm_pr kvm xfs libcrc32c tg3 ptp pps_core
[  909.591279] CPU: 64 PID: 14123 Comm: qemu-system-ppc Tainted: G        W       4.13.0-3.rc3.dev.gitec0d270.el7.centos.ppc64le #1
[  909.591481] task: c000000f0c9cc600 task.stack: c000000f061a8000
[  909.591596] NIP: c000000000988b48 LR: c000000000988c04 CTR: 00000000300304d0
[  909.591733] REGS: c000000f061ab6f0 TRAP: 0300   Tainted: G        W        (4.13.0-3.rc3.dev.gitec0d270.el7.centos.ppc64le)
[  909.591913] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[  909.591919]   CR: 42024422  XER: 20000000
[  909.592080] CFAR: c0000000000087d8 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1 
[  909.592080] GPR00: c000000000988c04 c000000f061ab970 c000000001397a00 c000000f23120750 
[  909.592080] GPR04: 0000000000000000 c000000000098244 9000000000009033 0000000000000000 
[  909.592080] GPR08: 0000000000000001 0000000000000000 5deadbeef0000100 9000000000001003 
[  909.592080] GPR12: 0000000000004400 c00000000fda8000 0000000000000000 0000000000000000 
[  909.592080] GPR16: 0000000000000000 0000000124cb8058 0000000124cb8038 00000001250ed8b8 
[  909.592080] GPR20: 00000001250ed8b0 00000001250ed8d0 c00000000138d820 c000000000d9c238 
[  909.592080] GPR24: 0000000000000001 5deadbeef0000100 c000000f061abb80 c000000000f24840 
[  909.592080] GPR28: c0000000013cbe50 0000000000000001 c000000f231215e0 c000000f23120750 
[  909.593263] NIP [c000000000988b48] tcf_chain_flush+0x28/0x70
[  909.593377] LR [c000000000988c04] tcf_chain_destroy+0x74/0xa0
[  909.593491] Call Trace:
[  909.593540] [c000000f061ab970] [0000000000000001] 0x1 (unreliable)
[  909.593654] [c000000f061ab9a0] [c000000000988c04] tcf_chain_destroy+0x74/0xa0
[  909.593783] [c000000f061ab9d0] [c000000000988c84] tcf_block_put+0x54/0x90
[  909.593847] [c000000f061aba00] [d000000014d3178c] htb_destroy_class.isra.11+0x5c/0x80 [sch_htb]
[  909.593935] [c000000f061aba30] [d000000014d318a8] htb_destroy+0xf8/0x1b0 [sch_htb]
[  909.594013] [c000000f061abab0] [c0000000009818a4] qdisc_destroy+0xe4/0x170
[  909.594076] [c000000f061abae0] [c00000000098332c] dev_shutdown+0xbc/0x100
[  909.594140] [c000000f061abb20] [c00000000093f248] rollback_registered_many+0x2f8/0x560
[  909.594217] [c000000f061abbf0] [c00000000093f520] rollback_registered+0x70/0xb0
[  909.594292] [c000000f061abc40] [c000000000941908] unregister_netdevice_queue+0x128/0x180
[  909.594369] [c000000f061abcc0] [c00000000077a6cc] __tun_detach+0x22c/0x460
[  909.594433] [c000000f061abd20] [c00000000077a938] tun_chr_close+0x38/0x60
[  909.594496] [c000000f061abd50] [c00000000035abf8] __fput+0xd8/0x280
[  909.594563] [c000000f061abdb0] [c000000000120f20] task_work_run+0x140/0x1a0
[  909.594628] [c000000f061abe00] [c00000000001d810] do_notify_resume+0xf0/0x100
[  909.594704] [c000000f061abe30] [c00000000000bf44] ret_from_except_lite+0x70/0x74
[  909.594778] Instruction dump:
[  909.594816] 7c0803a6 4e800020 3c4c00a1 3842eee0 7c0802a6 60000000 7c0802a6 fbe1fff8 
[  909.594895] f8010010 f821ffd1 7c7f1b78 e9230008 <e9490000> 2faa0000 419e001c 39400000 
[  909.594975] ---[ end trace c2b424e83e247e4c ]---
[  909.601138] 
Mirrored with LTC bug #158177
cdeadmin commented 7 years ago

------- Comment From viparash@in.ibm.com 2017-08-31 09:26:40 EDT------- (In reply to comment #1)

I see two issues here

Issue 1

> Subsequently it crashes further in tcf_chain_flush() due to hitting to segmentation fault.

cdeadmin commented 7 years ago

------- Comment From satheera@in.ibm.com 2017-09-26 06:44:04 EDT------- Am not hitting an issue with latest nightly devel 4.13.0-4.dev.git49564cb.el7.centos.ppc64le

------- Comment From satheera@in.ibm.com 2017-09-26 06:45:15 EDT------- Closing as per previous comment