liyi-ibm / linux

Linux kernel source tree
Other
0 stars 1 forks source link

kernel oops in 4.14.49-5 kernel #18

Open liyi-ibm opened 5 years ago

liyi-ibm commented 5 years ago

This kernel oops also happens in 4.14.49-5 kernel:

/sys/fs/pstore/dmesg-nvram-1
Oops#1 Part1
<6>[6202252.103571] drop_cache.sh (57792): drop_caches: 3
<6>[6203927.983602] usb 1-4.4.2: reset high-speed USB device number 6 using xhci_hcd
<6>[6204171.080691] drop_cache.sh (57792): drop_caches: 3
<6>[6205981.366687] drop_cache.sh (57792): drop_caches: 3
<6>[6208010.626700] drop_cache.sh (57792): drop_caches: 3
<6>[6209410.954719] drop_cache.sh (57792): drop_caches: 3
<4>[6210226.471153] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
<4>[6210226.804932] mpt3sas_cm1: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
<4>[6210228.105479] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
<4>[6210228.585137] mpt3sas_cm1: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
<6>[6210297.345129] usb 1-4.4.2: reset high-speed USB device number 6 using xhci_hcd
<6>[6210510.484662] drop_cache.sh (57792): drop_caches: 3
<6>[6211393.270825] usb 1-4.4.3: reset high-speed USB device number 7 using xhci_hcd
<6>[6211458.377645] drop_cache.sh (57792): drop_caches: 3
<1>[6212190.979932] BUG: Bad page state in process java  pfn:1fd265
<0>[6212190.980034] page:c00a000007f49940 count:0 mapcount:0 mapping:          (null) index:0x0
<0>[6212190.980119] flags: 0x3ffff000000100(slab)
<1>[6212190.980150] raw: 003ffff000000100 0000000000000000 0000000000000000 0000000080400040
<1>[6212190.980204] raw: 5deadbeef0000100 5deadbeef0000200 c00000024d450000 c000201c9867d000
<1>[6212190.980267] page dumped because: page still charged to cgroup
<1>[6212190.980349] page->mem_cgroup:c000201c9867d000
<1>[6212190.980384] bad because of flags: 0x100(slab)
<4>[6212190.980419] Modules linked in: dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag i2c_dev joydev at24 ixgbe ptp pps_core mdio ofpart powernv_flash mtd ipmi_powernv ipmi_devintf ipmi_msghandler opal_prd i2c_opal nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc binfmt_misc ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mpt3sas ttm drm raid_class scsi_transport_sas i2c_core usb_storage
<4>[6212190.980463] CPU: 164 PID: 55794 Comm: java Tainted: G        W       4.14.49-3.ppc64le #1
<4>[6212190.980464] Call Trace:
<4>[6212190.980471] [c00020072588f160] [c000000000acb99c] dump_stack+0xb0/0xf4 (unreliable)
<4>[6212190.980475] [c00020072588f1a0] [c00000000027e004] bad_page+0x124/0x1a0
<4>[6212190.980480] [c00020072588f230] [c00000000006ff28] destroy_context+0xe8/0x140
<4>[6212190.980483] [c00020072588f260] [c0000000000f74e4] __mmdrop+0x64/0x1f0
<4>[6212190.980488] [c00020072588f2e0] [c0000000001348d8] finish_task_switch+0x2d8/0x330
<4>[6212190.980491] [c00020072588f390] [c000000000ae52f0] __schedule+0x2b0/0x9f0
<4>[6212190.980493] [c00020072588f460] [c000000000ae5a78] schedule+0x48/0xc0
<4>[6212190.980496] [c00020072588f490] [c00000000013beac] io_schedule+0x2c/0x60
<4>[6212190.980499] [c00020072588f4c0] [c0000000002772c8] wait_on_page_bit_killable+0x198/0x220
<4>[6212190.980501] [c00020072588f580] [c000000000277454] __lock_page_or_retry+0x104/0x170
<4>[6212190.980503] [c00020072588f5c0] [c000000000277b24] filemap_fault+0x664/0x700
<4>[6212190.980508] [c00020072588f640] [c000000000450bbc] ext4_filemap_fault+0x4c/0x80
<4>[6212190.980511] [c00020072588f670] [c0000000002c8b08] __do_fault+0x48/0x1b0
<4>[6212190.980514] [c00020072588f6b0] [c0000000002d2dc4] __handle_mm_fault+0x1584/0x1ee0
<4>[6212190.980517] [c00020072588f790] [c0000000002d3848] handle_mm_fault+0x128/0x200
<4>[6212190.980519] [c00020072588f7d0] [c00000000006498c] __do_page_fault+0x1cc/0x8c0
<4>[6212190.980523] [c00020072588f8a0] [c00000000000aca4] handle_page_fault+0x18/0x38
<4>[6212190.980528] --- interrupt: 301 at iov_iter_fault_in_readable+0x90/0x1a0
<4>[6212190.980528]     LR = generic_perform_write+0xa4/0x260
<4>[6212190.980531] [c00020072588fb90] [c00020072588fbe0] 0xc00020072588fbe0 (unreliable)
<4>[6212190.980534] [c00020072588fc20] [c0000000002793c0] __generic_file_write_iter+0x200/0x240
<4>[6212190.980536] [c00020072588fc80] [c000000000435edc] ext4_file_write_iter+0x17c/0x490
<4>[6212190.980540] [c00020072588fd00] [c00000000035ed80] __vfs_write+0x130/0x1f0
<4>[6212190.980543] [c00020072588fd90] [c00000000035f070] vfs_write+0xd0/0x240
<4>[6212190.980545] [c00020072588fde0] [c00000000035f3b8] SyS_write+0x68/0x110
<4>[6212190.980548] [c00020072588fe30] [c00000000000b9e0] system_call+0x58/0x6c
<4>[6212190.980550] Disabling lock debugging due to kernel taint
<4>[6212200.651923] ------------[ cut here ]------------
<2>[6212200.651926] kernel BUG at mm/slub.c:3945!
<4>[6212200.651929] Oops: Exception in kernel mode, sig: 5 [#1]
<4>[6212200.651991] LE SMP NR_CPUS=1024 NUMA PowerNV
<4>[6212200.652041] Modules linked in: dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag i2c_dev joydev at24 ixgbe ptp pps_core mdio ofpart powernv_flash mtd ipmi_powernv ipmi_devintf ipmi_msghandler opal_prd i2c_opal nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc binfmt_misc ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mpt3sas ttm drm raid_class scsi_transport_sas i2c_core usb_storage
<4>[6212200.652278] CPU: 28 PID: 115189 Comm: kworker/28:2 Tainted: G    B   W       4.14.49-3.ppc64le #1
<4>[6212200.652381] Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
<4>[6212200.652416] task: c000001fed6c0000 task.stack: c0000005a4bcc000
<4>[6212200.652449] NIP:  c000000000321700 LR: c00000000032168c CTR: 0000000000000000
<4>[6212200.652488] REGS: c0000005a4bcf6e0 TRAP: 0700   Tainted: G    B   W        (4.14.49-3.ppc64le)
<4>[6212200.652534] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002084  XER: 00000000
<4>[6212200.652578] CFAR: c0000000003216ac SOFTE: 0 
<4>[6212200.652578] GPR00: c00000000032168c c0000005a4bcf960 c0000000013d3000 0000000000000001 
<4>[6212200.652578] GPR04: c009e000a9ecc9f8 c000001f907b2f50 000000007fff0000 0000000000000000 
<4>[6212200.652578] GPR08: ffffffffffff8040 ffffffffffff8040 0000000000000001 0000000000000010 
<4>[6212200.652578] GPR12: 0000000084002088 c000000007d53400 c0000000001289c8 c0000000658e2100 
<4>[6212200.652578] GPR16: 0000000000000000 0000000000000000 c000001f907b2f50 c000001f907b2f30 
<4>[6212200.652578] GPR20: 0000000000000001 0000000000000000 c00000000140cf0c c00000024d450000 
<4>[6212200.652578] GPR24: 0000000000000000 c00000024d450000 c0000005a4bcf980 c0000005a4bcfb80 
<4>[6212200.652578] GPR28: c0000005a4bcf970 c000001f907b2f40 c00a000007f49940 c00a000007f49960 
<4>[6212200.652965] NIP [c000000000321700] __kmem_cache_shrink+0x180/0x390
<4>[6212200.652999] LR [c00000000032168c] __kmem_cache_shrink+0x10c/0x390
<4>[6212200.653031] Call Trace:
<4>[6212200.653047] [c0000005a4bcf960] [c00000000032168c] __kmem_cache_shrink+0x10c/0x390 (unreliable)
<4>[6212200.653094] [c0000005a4bcfc20] [c000000000321938] kmemcg_cache_deact_after_rcu+0x28/0xa0
<4>[6212200.653165] [c0000005a4bcfc50] [c0000000002b7d24] kmemcg_deactivate_workfn+0x64/0x120
<4>[6212200.653233] [c0000005a4bcfc80] [c0000000001205b0] process_one_work+0x1a0/0x490
<4>[6212200.653273] [c0000005a4bcfd20] [c000000000120938] worker_thread+0x98/0x560
<4>[6212200.653314] [c0000005a4bcfdc0] [c000000000128b28] kthread+0x168/0x1b0
<4>[6212200.653350] [c0000005a4bcfe30] [c00000000000bdd0] ret_from_kernel_thread+0x5c/0x8c
<4>[6212200.653390] Instruction dump:
<4>[6212200.653410] 419e00c0 7e7e9b78 7d334b78 e93e0018 a15e0018 79298462 7d2a4850 7d2807b4 
<4>[6212200.653452] 3949ffff 7d4a4b78 554a0ffe 7d4a07b4 <0b0a0000> e95e0018 794a8462 7f8a4800 
<4>[6212200.653506] ---[ end trace 1de5da1329e358c9 ]---
liyi-ibm commented 5 years ago

According to Nick, this patch can fix this oops: https://github.com/liyi-ibm/linux/commit/856c18cbbb4e6d7f3db4afe9d5ecd88064f8ae61

The page table fragment refcount bug is an underflow of the count. What that means is that the page table page will be freed before the last user of it goes away.
What your crash message here shows is that when the last user frees the page table page, it is in a bad state and it is actually a "slab" page. So what has happened is that it got allocated for something else in the meantime.
Then we get another crash in the slab allocator, and that is explained by the page table freeing code overwriting the page that slab subsystem was using.
Basically one page becomes used by two different subsystems due to the refcount bug (page table and slab), and we can see warnings happen in each of those subsystems caused by activity from the other.