ssrg-vt / popcorn-kernel

Popcorn Linux kernel for distributed thread execution
Other
156 stars 22 forks source link

[merge branch] occasional paging issues while loading msg layer #84

Closed bxatnarf closed 4 years ago

bxatnarf commented 4 years ago

I sometimes see the following errors while loading the popcorn msg layer. insmod does not return on the affected when this happens, although it does return successfully on the unaffected host.

[ 1378.105543] BUG: unable to handle page fault for address: ffff8881399ae000
[ 1378.105973] #PF: supervisor write access in kernel mode
[ 1378.106139] #PF: error_code(0x000b) - reserved bit violation
[ 1378.106348] PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 139a90063 PTE 800ffffec6651063
[ 1378.106777] Oops: 000b [#1] SMP NOPTI
[ 1378.106968] CPU: 0 PID: 639 Comm: sudo Not tainted 5.2.0-rc4-popcorn+ #1
[ 1378.107157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181124
[ 1378.107532] RIP: 0010:__tlb_remove_page_size+0x68/0x90
[ 1378.107713] Code: e0 5b 41 5c c3 83 7f 1c 13 74 33 31 f6 bf 00 28 00 00 e8 bb f7 00 00c
[ 1378.108188] RSP: 0018:ffffc9000070bc70 EFLAGS: 00000202
[ 1378.108371] RAX: ffff8881399ae000 RBX: ffffc9000070bda0 RCX: 000001fe00000000
[ 1378.108371] RDX: ffff888000000000 RSI: 00000000ffffffff RDI: 0000000000000246
[ 1378.108371] RBP: ffffea00043c5640 R08: 0000000000000000 R09: 0000000000000001
[ 1378.108371] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1378.108371] R13: 000055555555c000 R14: ffffc9000070bda0 R15: ffff888139a1ea10
[ 1378.108371] FS:  0000000000000000(0000) GS:ffff88813b600000(0000) knlGS:000000000000000
[ 1378.108371] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1378.108371] CR2: ffff8881399ae000 CR3: 0000000139b40000 CR4: 00000000000006f0
[ 1378.108371] Call Trace:
[ 1378.108371]  unmap_page_range+0x4c0/0x800
[ 1378.108371]  unmap_vmas+0x32/0x50
[ 1378.108371]  exit_mmap+0x8e/0x160
[ 1378.108371]  mmput+0x41/0xf0
[ 1378.108371]  do_exit+0x2bb/0xba0
[ 1378.108371]  ? sched_clock_local+0x12/0x80
[ 1378.108371]  do_group_exit+0x39/0xb0
[ 1378.108371]  __x64_sys_exit_group+0x14/0x20
[ 1378.108371]  do_syscall_64+0x69/0x440
[ 1378.108371]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 1378.108371]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1378.108371] RIP: 0033:0x7ffff72992e9
[ 1378.108371] Code: 00 41 b8 3c 00 00 00 eb 19 0f 1f 84 00 00 00 00 00 48 89 d7 44 89 c0e
[ 1378.108371] RSP: 002b:00007fffffffe458 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 1378.108371] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff72992e9
[ 1378.108371] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1378.108371] RBP: 00007ffff7580860 R08: 000000000000003c R09: 00000000000000e7
[ 1378.108371] R10: fffffffffffffeb0 R11: 0000000000000246 R12: 00007ffff7580860
[ 1378.108371] R13: 00007ffff7585c60 R14: 00005555557845d0 R15: 00005555557845d0
[ 1378.108371] Modules linked in: msg_socket
[ 1378.108371] CR2: ffff8881399ae000
[ 1378.108371] ---[ end trace 11f90f2492eb93bb ]---
[ 1378.108371] RIP: 0010:__tlb_remove_page_size+0x68/0x90
[ 1378.108371] Code: e0 5b 41 5c c3 83 7f 1c 13 74 33 31 f6 bf 00 28 00 00 e8 bb f7 00 00c
[ 1378.108371] RSP: 0018:ffffc9000070bc70 EFLAGS: 00000202
[ 1378.108371] RAX: ffff8881399ae000 RBX: ffffc9000070bda0 RCX: 000001fe00000000
[ 1378.108371] RDX: ffff888000000000 RSI: 00000000ffffffff RDI: 0000000000000246
[ 1378.108371] RBP: ffffea00043c5640 R08: 0000000000000000 R09: 0000000000000001
[ 1378.108371] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1378.108371] R13: 000055555555c000 R14: ffffc9000070bda0 R15: ffff888139a1ea10
[ 1378.108371] FS:  0000000000000000(0000) GS:ffff88813b600000(0000) knlGS:000000000000000
[ 1378.108371] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1378.108371] CR2: ffff8881399ae000 CR3: 0000000139b40000 CR4: 00000000000006f0
[ 1378.108371] Fixing recursive fault but reboot is needed!
[ 1378.108371] BUG: scheduling while atomic: sudo/639/0x00000002
[ 1378.108371] INFO: lockdep is turned off.
[ 1378.108371] Modules linked in: msg_socket
[ 1378.108371] irq event stamp: 13366
[ 1378.108371] hardirqs last  enabled at (13365): [<ffffffff811d4dec>] get_page_from_free0
[ 1378.108371] hardirqs last disabled at (13366): [<ffffffff81001a1c>] trace_hardirqs_offc
[ 1378.108371] softirqs last  enabled at (13344): [<ffffffff8180032e>] __do_softirq+0x32e9
[ 1378.108371] softirqs last disabled at (13331): [<ffffffff81068377>] irq_exit+0x97/0xd0
[ 1378.108371] CPU: 0 PID: 639 Comm: sudo Tainted: G      D           5.2.0-rc4-popcorn+ 1
[ 1378.108371] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181124
[ 1378.108371] Call Trace:
[ 1378.108371]  dump_stack+0x67/0x90
[ 1378.108371]  __schedule_bug.cold+0x1a/0x27
[ 1378.108371]  __schedule+0x5a2/0x830
[ 1378.108371]  ? printk+0x58/0x6f
[ 1378.108371]  schedule+0x3a/0xb0
[ 1378.108371]  do_exit.cold+0x62/0x91
[ 1378.108371]  rewind_stack_do_exit+0x17/0x20
[ 1387.253528] BUG: unable to handle page fault for address: ffff88813a2ef000
[ 1387.253849] #PF: supervisor write access in kernel mode
[ 1387.254105] #PF: error_code(0x000b) - reserved bit violation
[ 1387.254375] PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 13a3e5063 PTE 800ffffec5d10063
[ 1387.254716] Oops: 000b [#2] SMP NOPTI
[ 1387.254873] CPU: 0 PID: 1 Comm: systemd Tainted: G      D W         5.2.0-rc4-popcorn+1
[ 1387.255223] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181124
[ 1387.255642] RIP: 0010:cache_alloc_refill+0x3db/0x6b0
[ 1387.255860] Code: 8b 57 24 31 db 85 d2 74 2d 49 8b 47 50 48 85 c0 74 11 89 df 41 0f afb
[ 1387.256771] RSP: 0018:ffffc9000005bcf8 EFLAGS: 00000246
[ 1387.256943] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888000000000
[ 1387.256943] RDX: ffff88813a2ef000 RSI: 0000000000000006 RDI: 0000000040002000
[ 1387.256943] RBP: 0000000000000400 R08: 00000000001fc1cf R09: 0000000000000000
[ 1387.256943] R10: 0000000000000001 R11: 00000000001fc1c8 R12: ffffea00044ba448
[ 1387.256943] R13: 0000000000000cc0 R14: 000000000000000c R15: ffff88813b0006c0
[ 1387.256943] FS:  00007f8aef9b8880(0000) GS:ffff88813b600000(0000) knlGS:000000000000000
[ 1387.256943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1387.256943] CR2: ffff88813a2ef000 CR3: 000000013a2ae000 CR4: 00000000000006f0
[ 1387.256943] Call Trace:
[ 1387.256943]  kmem_cache_alloc_trace+0x1f5/0x240
[ 1387.256943]  proc_cgroup_show+0x30/0x2a0
[ 1387.256943]  proc_single_show+0x51/0x90
[ 1387.256943]  seq_read+0xd5/0x400
[ 1387.256943]  vfs_read+0xb2/0x170
[ 1387.256943]  ksys_read+0x68/0xe0
[ 1387.256943]  do_syscall_64+0x69/0x440
[ 1387.256943]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 1387.256943]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1387.256943] RIP: 0033:0x7f8aef2baba0
[ 1387.256943] Code: 0b 31 c0 48 83 c4 08 e9 be fe ff ff 48 8d 3d 3f f0 08 00 e8 e2 ce 014
[ 1387.256943] RSP: 002b:00007ffe0cff6be8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1387.256943] RAX: ffffffffffffffda RBX: 00005591dfc7c270 RCX: 00007f8aef2baba0
[ 1387.256943] RDX: 0000000000000400 RSI: 00007f8aef9c3000 RDI: 000000000000000d
[ 1387.256943] RBP: 000000000000000a R08: 00000000ffffffff R09: 0000000000000000
[ 1387.256943] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000
[ 1387.256943] R13: 0000000000000000 R14: 00005591dfc7c270 R15: 00000000000007ff
[ 1387.256943] Modules linked in: msg_socket
[ 1387.256943] CR2: ffff88813a2ef000
[ 1387.256943] ---[ end trace 11f90f2492eb93bc ]---
[ 1387.256943] RIP: 0010:__tlb_remove_page_size+0x68/0x90
[ 1387.256943] Code: e0 5b 41 5c c3 83 7f 1c 13 74 33 31 f6 bf 00 28 00 00 e8 bb f7 00 00c
[ 1387.256943] RSP: 0018:ffffc9000070bc70 EFLAGS: 00000202
[ 1387.256943] RAX: ffff8881399ae000 RBX: ffffc9000070bda0 RCX: 000001fe00000000
[ 1387.256943] RDX: ffff888000000000 RSI: 00000000ffffffff RDI: 0000000000000246
[ 1387.256943] RBP: ffffea00043c5640 R08: 0000000000000000 R09: 0000000000000001
[ 1387.256943] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1387.256943] R13: 000055555555c000 R14: ffffc9000070bda0 R15: ffff888139a1ea10
[ 1387.256943] FS:  00007f8aef9b8880(0000) GS:ffff88813b600000(0000) knlGS:000000000000000
[ 1387.256943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1387.256943] CR2: ffff88813a2ef000 CR3: 000000013a2ae000 CR4: 00000000000006f0
[ 1387.256943] BUG: sleeping function called from invalid context at include/linux/percpu4
[ 1387.256943] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: systemd
[ 1387.256943] INFO: lockdep is turned off.
[ 1387.256943] irq event stamp: 488400
[ 1387.256943] hardirqs last  enabled at (488399): [<ffffffff8154d189>] _raw_spin_unlock_0
[ 1387.256943] hardirqs last disabled at (488400): [<ffffffff81546757>] __schedule+0xb7/00
[ 1387.256943] softirqs last  enabled at (488246): [<ffffffff814f37db>] unix_sock_destruc0
[ 1387.256943] softirqs last disabled at (488244): [<ffffffff814f37db>] unix_sock_destruc0
[ 1387.256943] CPU: 0 PID: 1 Comm: systemd Tainted: G      D W         5.2.0-rc4-popcorn+1
[ 1387.256943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181124
[ 1387.256943] Call Trace:
[ 1387.256943]  dump_stack+0x67/0x90
[ 1387.256943]  ___might_sleep.cold+0x9f/0xaf
[ 1387.256943]  exit_signals+0x1c/0x200
[ 1387.256943]  do_exit+0xb0/0xba0
[ 1387.256943]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 1387.256943]  rewind_stack_do_exit+0x17/0x20
[ 1387.257135] BUG: unable to handle page fault for address: ffff88813a2ee000
[ 1387.257435] #PF: supervisor write access in kernel mode
[ 1387.257589] #PF: error_code(0x000b) - reserved bit violation
[ 1387.257853] PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 13a3e5063 PTE 800ffffec5d11063
[ 1387.258199] Oops: 000b [#3] SMP NOPTI
[ 1387.258368] CPU: 0 PID: 1 Comm: systemd Tainted: G      D W         5.2.0-rc4-popcorn+1
[ 1387.258714] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181124
[ 1387.259054] RIP: 0010:__tlb_remove_page_size+0x68/0x90
[ 1387.259197] Code: e0 5b 41 5c c3 83 7f 1c 13 74 33 31 f6 bf 00 28 00 00 e8 bb f7 00 00c
[ 1387.259968] RSP: 0018:ffffc9000005bd10 EFLAGS: 00000202
[ 1387.260129] RAX: ffff88813a2ee000 RBX: ffffc9000005be40 RCX: 000001fe00000000
[ 1387.260533] RDX: ffff888000000000 RSI: 0000000000000000 RDI: ffffffff811d4dec
[ 1387.260843] RBP: ffffea00045e6568 R08: 00000000001fc1c8 R09: 0000000000000000
[ 1387.260943] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 1387.260943] R13: 00005591de19f000 R14: ffffc9000005be40 R15: ffff88813a18cf18
[ 1387.260943] FS:  0000000000000000(0000) GS:ffff88813b600000(0000) knlGS:000000000000000
[ 1387.260943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1387.260943] CR2: ffff88813a2ee000 CR3: 000000013a2ae000 CR4: 00000000000006f0
[ 1387.260943] Call Trace:
[ 1387.260943]  unmap_page_range+0x4c0/0x800
[ 1387.260943]  unmap_vmas+0x32/0x50
[ 1387.260943]  exit_mmap+0x8e/0x160
[ 1387.260943]  mmput+0x41/0xf0
[ 1387.260943]  do_exit+0x2bb/0xba0
[ 1387.260943]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 1387.260943]  rewind_stack_do_exit+0x17/0x20
[ 1387.260943] Modules linked in: msg_socket
[ 1387.260943] CR2: ffff88813a2ee000
[ 1387.260943] ---[ end trace 11f90f2492eb93bd ]---
[ 1387.260943] RIP: 0010:__tlb_remove_page_size+0x68/0x90
[ 1387.260943] Code: e0 5b 41 5c c3 83 7f 1c 13 74 33 31 f6 bf 00 28 00 00 e8 bb f7 00 00c
[ 1387.260943] RSP: 0018:ffffc9000070bc70 EFLAGS: 00000202
[ 1387.260943] RAX: ffff8881399ae000 RBX: ffffc9000070bda0 RCX: 000001fe00000000
[ 1387.260943] RDX: ffff888000000000 RSI: 00000000ffffffff RDI: 0000000000000246
[ 1387.260943] RBP: ffffea00043c5640 R08: 0000000000000000 R09: 0000000000000001
[ 1387.260943] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1387.260943] R13: 000055555555c000 R14: ffffc9000070bda0 R15: ffff888139a1ea10
[ 1387.260943] FS:  0000000000000000(0000) GS:ffff88813b600000(0000) knlGS:000000000000000
[ 1387.260943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1387.260943] CR2: ffff88813a2ee000 CR3: 000000013a2ae000 CR4: 00000000000006f0
[ 1387.260943] Fixing recursive fault but reboot is needed!
[ 1387.260943] BUG: scheduling while atomic: systemd/1/0x00000002
[ 1387.260943] INFO: lockdep is turned off.
[ 1387.260943] Modules linked in: msg_socket
[ 1387.260943] irq event stamp: 488400
[ 1387.260943] hardirqs last  enabled at (488399): [<ffffffff8154d189>] _raw_spin_unlock_0
[ 1387.260943] hardirqs last disabled at (488400): [<ffffffff81546757>] __schedule+0xb7/00
[ 1387.260943] softirqs last  enabled at (488246): [<ffffffff814f37db>] unix_sock_destruc0
[ 1387.260943] softirqs last disabled at (488244): [<ffffffff814f37db>] unix_sock_destruc0
[ 1387.260943] CPU: 0 PID: 1 Comm: systemd Tainted: G      D W         5.2.0-rc4-popcorn+1
[ 1387.260943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181124
[ 1387.260943] Call Trace:
[ 1387.260943]  dump_stack+0x67/0x90
[ 1387.260943]  __schedule_bug.cold+0x1a/0x27
[ 1387.260943]  __schedule+0x5a2/0x830
[ 1387.260943]  ? printk+0x58/0x6f
[ 1387.260943]  schedule+0x3a/0xb0
[ 1387.260943]  do_exit.cold+0x62/0x91
[ 1387.260943]  rewind_stack_do_exit+0x17/0x20
AHatnarf commented 4 years ago

Closing for now in relation to the referenced message in #91.