ssrg-vt / popcorn-kernel

Popcorn Linux kernel for distributed thread execution
Other
156 stars 22 forks source link

[Merge] Page fault when shutting down after updating systemd #91

Closed AHatnarf closed 4 years ago

AHatnarf commented 4 years ago

After updating systemd (apt upgrade in the provided ubuntu img), the kernel encountered a page fault. A similar page fault happened while installing the package originally, in the future I'll see if I can get a trace for that (happened somewhere along the line while restarting systemd).

Using the current latest merge branch kernel.


         Stopping Getty on tty1...
         Stopping Serial Getty on ttyS0...
         Stopping Login Service...
         Stopping LSB: Load kernel image with kexec...
[  189.327800][    T1] BUG: unable to handle page fault for address: ffff88805a0e9000
[  189.327800][    T1] #PF: supervisor write access in kernel mode
[  189.327800][    T1] #PF: error_code(0x000b) - reserved bit violation
[  189.327800][    T1] PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 5a13b063 PTE 800fffffa5f16063
[  189.331800][    T1] Oops: 000b [#1] SMP NOPTI
[  189.331800][    T1] CPU: 1 PID: 1 Comm: systemd Not tainted 5.2.0-rc4-popcorn+ #10
[  189.331800][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  189.331800][    T1] RIP: 0010:clear_page_orig+0x12/0x40
[  189.331800][    T1] Code: 00 b8 01 00 00 00 5b c3 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[  189.331800][    T1] RSP: 0018:ffffc9000005b990 EFLAGS: 00000216
[  189.331800][    T1] RAX: 0000000000000000 RBX: dead000000000100 RCX: 000000000000003f
[  189.331800][    T1] RDX: ffff88805b5a6040 RSI: 00000000013b32f8 RDI: ffff88805a0e9000
[  189.331800][    T1] RBP: ffffc9000005bb08 R08: 0000000000000000 R09: 00000000013b3330
[  189.331800][    T1] R10: ffff888000000000 R11: 6db6db6db6db6db7 R12: 0000000000000010
[  189.331800][    T1] R13: ffffffff81cd9f40 R14: ffffea00013b32f8 R15: ffffea00013b32f8
[  189.331800][    T1] FS:  00007fe762d34880(0000) GS:ffff88805be00000(0000) knlGS:0000000000000000
[  189.331800][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  189.331800][    T1] CR2: ffff88805a0e9000 CR3: 000000005a07a000 CR4: 00000000000006e0
[  189.331800][    T1] Call Trace:
[  189.331800][    T1]  get_page_from_freelist+0x7dc/0x13d0
[  189.331800][    T1]  ? sched_clock_local+0x12/0x80
[  189.331800][    T1]  __alloc_pages_nodemask+0x178/0xfa0
[  189.343801][    T1]  ? sched_clock_local+0x12/0x80
[  189.343801][    T1]  ? sched_clock_local+0x12/0x80
[  189.343801][    T1]  ? sched_clock_local+0x12/0x80
[  189.343801][    T1]  ? sched_clock_local+0x12/0x80
[  189.343801][    T1]  ? sched_clock_local+0x12/0x80
[  189.343801][    T1]  __get_free_pages+0x11/0x50
[  189.343801][    T1]  __pud_alloc+0x2a/0xc0
[  189.343801][    T1]  copy_page_range+0x80a/0x850
[  189.343801][    T1]  ? sched_clock_local+0x12/0x80
[  189.343801][    T1]  ? dup_mm.isra.7+0x1c7/0x4d0
[  189.343801][    T1]  ? vma_gap_update+0x27/0x40
[  189.343801][    T1]  dup_mm.isra.7+0x36c/0x4d0
[  189.343801][    T1]  copy_process.part.9+0x1bc0/0x1bf0
[  189.343801][    T1]  _do_fork+0xe4/0x6f0
[  189.343801][    T1]  ? mntput_no_expire+0x74/0x3e0
[  189.343801][    T1]  ? rcu_read_lock_sched_held+0x74/0x80
[  189.343801][    T1]  do_syscall_64+0x69/0x440
[  189.343801][    T1]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  189.343801][    T1]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  189.343801][    T1] RIP: 0033:0x7fe762615014
[  189.343801][    T1] Code: f7 d8 64 89 04 25 d4 02 00 00 64 4c 8b 0c 25 10 00 00 00 31 d2 4d 8d 91 d0 02 00 00 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 3e 01 00 00 85 c0 41 89 c5 0f 85 45 01 00
[  189.343801][    T1] RSP: 002b:00007fff414b4170 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  189.343801][    T1] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe762615014
[  189.343801][    T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  189.343801][    T1] RBP: 00007fff414b41f0 R08: 0000000000000001 R09: 00007fe762d34880
[  189.343801][    T1] R10: 00007fe762d34b50 R11: 0000000000000246 R12: 00007fff414b4170
[  189.343801][    T1] R13: 00007fff414b4190 R14: 000055cba5e3c770 R15: 000055cba5e38228
[  189.343801][    T1] Modules linked in:
[  189.343801][    T1] CR2: ffff88805a0e9000
[  189.343801][    T1] ---[ end trace 933925d2963b5d10 ]---
[  189.343801][    T1] RIP: 0010:clear_page_orig+0x12/0x40
[  189.343801][    T1] Code: 00 b8 01 00 00 00 5b c3 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[  189.343801][    T1] RSP: 0018:ffffc9000005b990 EFLAGS: 00000216
[  189.343801][    T1] RAX: 0000000000000000 RBX: dead000000000100 RCX: 000000000000003f
[  189.343801][    T1] RDX: ffff88805b5a6040 RSI: 00000000013b32f8 RDI: ffff88805a0e9000
[  189.343801][    T1] RBP: ffffc9000005bb08 R08: 0000000000000000 R09: 00000000013b3330
[  189.343801][    T1] R10: ffff888000000000 R11: 6db6db6db6db6db7 R12: 0000000000000010
[  189.343801][    T1] R13: ffffffff81cd9f40 R14: ffffea00013b32f8 R15: ffffea00013b32f8
[  189.343801][    T1] FS:  00007fe762d34880(0000) GS:ffff88805be00000(0000) knlGS:0000000000000000
[  189.343801][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  189.343801][    T1] CR2: ffff88805a0e9000 CR3: 000000005a07a000 CR4: 00000000000006e0
[  189.343801][    T1] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
[  189.343801][    T1] in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: systemd
[  189.343801][    T1] INFO: lockdep is turned off.
[  189.343801][    T1] irq event stamp: 644596
[  189.343801][    T1] hardirqs last  enabled at (644595): [<ffffffff811d7044>] get_page_from_freelist+0xf4/0x13d0
[  189.343801][    T1] hardirqs last disabled at (644596): [<ffffffff8100196a>] trace_hardirqs_off_thunk+0x1a/0x1c
[  189.343801][    T1] softirqs last  enabled at (644578): [<ffffffff818002ec>] __do_softirq+0x2ec/0x475
[  189.343801][    T1] softirqs last disabled at (644563): [<ffffffff8106856e>] irq_exit+0xbe/0xd0
[  189.343801][    T1] CPU: 1 PID: 1 Comm: systemd Tainted: G      D           5.2.0-rc4-popcorn+ #10
[  189.343801][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  189.343801][    T1] Call Trace:
[  189.343801][    T1]  dump_stack+0x67/0x9b
[  189.343801][    T1]  ___might_sleep+0x149/0x230
[  189.343801][    T1]  exit_signals+0x30/0x240
[  189.343801][    T1]  do_exit+0xb0/0xc30
[  189.343801][    T1]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  189.343801][    T1]  rewind_stack_do_exit+0x17/0x20
[  189.375803][    T1] note: systemd[1] exited with preempt_count 1
[  189.383804][  T614] BUG: unable to handle page fault for address: ffff88805a143000
[  189.383804][  T614] #PF: supervisor write access in kernel mode
[  189.383804][  T614] #PF: error_code(0x000b) - reserved bit violation
[  189.387804][  T614] PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 5a13b063 PTE 800fffffa5ebc063
[  189.387804][  T614] Oops: 000b [#2] SMP NOPTI
[  189.387804][  T614] CPU: 1 PID: 614 Comm: bash Tainted: G      D W         5.2.0-rc4-popcorn+ #10
[  189.387804][  T614] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  189.387804][  T614] RIP: 0010:clear_page_orig+0x12/0x40
[  189.387804][  T614] Code: 00 b8 01 00 00 00 5b c3 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[  189.387804][  T614] RSP: 0018:ffffc9000060bae8 EFLAGS: 00000216
[  189.387804][  T614] RAX: 0000000000000000 RBX: dead000000000100 RCX: 000000000000003f
[  189.387804][  T614] RDX: ffff88805a1f86c0 RSI: 00000000013b46a8 RDI: ffff88805a143000
[  189.387804][  T614] RBP: ffffc9000060bc60 R08: 0000000000000000 R09: 00000000013b46e0
[  189.387804][  T614] R10: ffff888000000000 R11: 6db6db6db6db6db7 R12: 0000000000000010
[  189.387804][  T614] R13: ffffffff81cd9f40 R14: ffffea00013b46a8 R15: ffffea00013b46a8
[  189.387804][  T614] FS:  00007ffff7fed700(0000) GS:ffff88805be00000(0000) knlGS:0000000000000000
[  189.387804][  T614] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  189.387804][  T614] CR2: ffff88805a143000 CR3: 000000005800a000 CR4: 00000000000006e0
[  189.387804][  T614] Call Trace:
[  189.387804][  T614]  get_page_from_freelist+0x7dc/0x13d0
[  189.387804][  T614]  ? lock_acquire+0xa6/0x1a0
[  189.387804][  T614]  ? fs_reclaim_acquire.part.26+0x5/0x30
[  189.387804][  T614]  ? lock_acquire+0xa6/0x1a0
[  189.387804][  T614]  __alloc_pages_nodemask+0x178/0xfa0
[  189.387804][  T614]  ? lock_acquire+0xa6/0x1a0
[  189.387804][  T614]  ? fs_reclaim_acquire.part.26+0x5/0x30
[  189.387804][  T614]  ? __kmalloc+0x1c4/0x280
[  189.387804][  T614]  __vmalloc_node_range+0x141/0x270
[  189.387804][  T614]  copy_process.part.9+0x92a/0x1bf0
[  189.387804][  T614]  ? _do_fork+0xe4/0x6f0
[  189.387804][  T614]  _do_fork+0xe4/0x6f0
[  189.387804][  T614]  ? __fd_install+0xc1/0x280
[  189.387804][  T614]  ? do_pipe2+0x7c/0xb0
[  189.387804][  T614]  do_syscall_64+0x69/0x440
[  189.387804][  T614]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  189.387804][  T614]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  189.387804][  T614] RIP: 0033:0x7ffff7498014
[  189.387804][  T614] Code: f7 d8 64 89 04 25 d4 02 00 00 64 4c 8b 0c 25 10 00 00 00 31 d2 4d 8d 91 d0 02 00 00 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 3e 01 00 00 85 c0 41 89 c5 0f 85 45 01 00
[  189.387804][  T614] RSP: 002b:00007fffffffd3c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  189.387804][  T614] RAX: ffffffffffffffda RBX: 0000000000000266 RCX: 00007ffff7498014
[  189.387804][  T614] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  189.387804][  T614] RBP: 00007fffffffd3e0 R08: 0000000000000000 R09: 00007ffff7fed700
[  189.387804][  T614] R10: 00007ffff7fed9d0 R11: 0000000000000246 R12: 0000000000000000
[  189.387804][  T614] R13: 000000000088bc08 R14: 0000000000000000 R15: 0000000000897a88
[  189.387804][  T614] Modules linked in:
[  189.387804][  T614] CR2: ffff88805a143000
[  189.387804][  T614] ---[ end trace 933925d2963b5d11 ]---```
AHatnarf commented 4 years ago

Same bug appeared when SSHing into a popcorn node, connection would be reset a few times while sshd was killed. This was not during shutdown, just while the system was idle. After three tries was able to log in successfully and run mt.

[  T654] #PF: supervisor write access in kernel mode
[  T654] #PF: error_code(0x000b) - reserved bit violation
[  T654] PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 59420063 PTE 800fffffa6b12063
[  T654] Oops: 000b [#1] SMP NOPTI
[  T654] CPU: 0 PID: 654 Comm: sshd Tainted: G           O      5.2.0-rc4-popcorn+ #32
[  T654] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  T654] RIP: 0010:clear_page_orig+0x12/0x40
[  T654] Code: 00 b8 01 00 00 00 5b c3 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[  T654] RSP: 0018:ffffc900006cb988 EFLAGS: 00000216
[  T654] RAX: 0000000000000000 RBX: dead000000000100 RCX: 000000000000003f
[  T654] RDX: ffff8880596fc2c0 RSI: 00000000013893d8 RDI: ffff8880594ed000
[  T654] RBP: ffffc900006cbb00 R08: 0000000000000000 R09: 0000000001389410
[  T654] R10: ffff888000000000 R11: 6db6db6db6db6db7 R12: 0000000000000010
[  T654] R13: ffffffff81cd9f40 R14: ffffea00013893d8 R15: ffffea00013893d8
[  T654] FS:  00007ffff7fe5800(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000
[  T654] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  T654] CR2: ffff8880594ed000 CR3: 00000000596b6000 CR4: 00000000000006f0
[  T654] Call Trace:
[  T654]  get_page_from_freelist+0x7dc/0x13d0
[  T654]  ? sched_clock_local+0x12/0x80
[  T654]  ? sched_clock_local+0x12/0x80
[  T654]  ? sched_clock_local+0x12/0x80
[  T654]  __alloc_pages_nodemask+0x178/0xfa0
[  T654]  ? sched_clock_local+0x12/0x80
[  T654]  ? sched_clock_local+0x12/0x80
[  T654]  ? sched_clock_local+0x12/0x80
[  T654]  ? sched_clock_local+0x12/0x80
[  T654]  ? sched_clock_local+0x12/0x80
[  T654]  pte_alloc_one+0x17/0x70
[  T654]  __pte_alloc+0x16/0x110
[  T654]  copy_page_range+0x71c/0x850
[  T654]  ? sched_clock_local+0x12/0x80
[  T654]  dup_mm.isra.7+0x36c/0x4d0
[  T654]  copy_process.part.9+0x1bc0/0x1bf0
[  T654]  _do_fork+0xe4/0x6f0
[  T654]  ? ksys_mmap_pgoff+0xaf/0x130
[  T654]  do_syscall_64+0x69/0x440
[  T654]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  T654]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  T654] RIP: 0033:0x7ffff6336014
[  T654] Code: f7 d8 64 89 04 25 d4 02 00 00 64 4c 8b 0c 25 10 00 00 00 31 d2 4d 8d 91 d0 02 00 00 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 3e 01 00 00 85 c0 41 89 c5 0f 85 45 01 00
[  T654] RSP: 002b:00007fffffffe260 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  T654] RAX: ffffffffffffffda RBX: 000000000000028e RCX: 00007ffff6336014
[  T654] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  T654] RBP: 00007fffffffe2c0 R08: 000000000000028e R09: 00007ffff7fe5800
[  T654] R10: 00007ffff7fe5ad0 R11: 0000000000000246 R12: 00007fffffffe260
[  T654] R13: 00007fffffffe280 R14: 000055555581c480 R15: 0000555555815e80
[  T654] Modules linked in: msg_socket(O)
[  T654] CR2: ffff8880594ed000
[  T654] ---[ end trace f72d8855a9e1315e ]---
[  T654] RIP: 0010:clear_page_orig+0x12/0x40
[  T654] Code: 00 b8 01 00 00 00 5b c3 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[  T654] RSP: 0018:ffffc900006cb988 EFLAGS: 00000216
[  T654] RAX: 0000000000000000 RBX: dead000000000100 RCX: 000000000000003f
[  T654] RDX: ffff8880596fc2c0 RSI: 00000000013893d8 RDI: ffff8880594ed000
[  T654] RBP: ffffc900006cbb00 R08: 0000000000000000 R09: 0000000001389410
[  T654] R10: ffff888000000000 R11: 6db6db6db6db6db7 R12: 0000000000000010
[  T654] R13: ffffffff81cd9f40 R14: ffffea00013893d8 R15: ffffea00013893d8
[  T654] FS:  00007ffff7fe5800(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000
[  T654] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  T654] CR2: ffff8880594ed000 CR3: 00000000596b6000 CR4: 00000000000006f0
[  T654] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
[  T654] in_atomic(): 1, irqs_disabled(): 1, pid: 654, name: sshd
[  T654] INFO: lockdep is turned off.
[  T654] irq event stamp: 10082
[  T654] hardirqs last  enabled at (10081): [<ffffffff811d7044>] get_page_from_freelist+0xf4/0x13d0
[  T654] hardirqs last disabled at (10082): [<ffffffff8100196a>] trace_hardirqs_off_thunk+0x1a/0x1c
[  T654] softirqs last  enabled at (9942): [<ffffffff818002ec>] __do_softirq+0x2ec/0x475
[  T654] softirqs last disabled at (9935): [<ffffffff8106856e>] irq_exit+0xbe/0xd0
[  T654] CPU: 0 PID: 654 Comm: sshd Tainted: G      D    O      5.2.0-rc4-popcorn+ #32
[  T654] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  T654] Call Trace:
[  T654]  dump_stack+0x67/0x9b
[  T654]  ___might_sleep+0x149/0x230
[  T654]  exit_signals+0x30/0x240
[  T654]  do_exit+0xb0/0xc30
[  T654]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  T654]  rewind_stack_do_exit+0x17/0x20
[  T654] note: sshd[654] exited with preempt_count 1
[  T656] BUG: unable to handle page fault for address: ffff88805a320000
[  T656] #PF: supervisor write access in kernel mode
[  T656] #PF: error_code(0x000b) - reserved bit violation
[  T656] PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 5a17d063 PTE 800fffffa5cdf063
[  T656] Oops: 000b [#2] SMP NOPTI
[  T656] CPU: 0 PID: 656 Comm: sshd Tainted: G      D W  O      5.2.0-rc4-popcorn+ #32
[  T656] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  T656] RIP: 0010:clear_page_orig+0x12/0x40
[  T656] Code: 00 b8 01 00 00 00 5b c3 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[  T656] RSP: 0018:ffffc900005ff9d0 EFLAGS: 00000216
[  T656] RAX: 0000000000000000 RBX: dead000000000100 RCX: 000000000000003f
[  T656] RDX: ffff888059676300 RSI: 00000000013baf00 RDI: ffff88805a320000
[  T656] RBP: ffffc900005ffb48 R08: 0000000000000000 R09: 00000000013baf38
[  T656] R10: ffff888000000000 R11: 6db6db6db6db6db7 R12: 0000000000000010
[  T656] R13: ffffffff81cd9f40 R14: ffffea00013baf00 R15: ffffea00013baf00
[  T656] FS:  00007ffff7fe5800(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000
[  T656] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  T656] CR2: ffff88805a320000 CR3: 000000005a28a000 CR4: 00000000000006f0
[  T656] Call Trace:
[  T656]  get_page_from_freelist+0x7dc/0x13d0
[  T656]  ? lock_acquire+0xa6/0x1a0
[  T656]  ? fs_reclaim_acquire.part.26+0x5/0x30
[  T656]  __alloc_pages_nodemask+0x178/0xfa0
[  T656]  ? __pmd_alloc+0xa9/0x170
[  T656]  pte_alloc_one+0x17/0x70
[  T656]  __pte_alloc+0x16/0x110
[  T656]  __handle_mm_fault+0x8f2/0xcc0
[  T656]  __get_user_pages+0x215/0x790
[  T656]  get_user_pages_remote+0x158/0x210
[  T656]  copy_strings+0x16b/0x2e0
[  T656]  ? kernel_read+0x2c/0x40
[  T656]  copy_strings_kernel+0x2c/0x40
[  T656]  __do_execve_file+0x6c2/0xa60
[  T656]  __x64_sys_execve+0x26/0x30
[  T656]  do_syscall_64+0x69/0x440
[  T656]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  T656]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  T656] RIP: 0033:0x7ffff6336317
[  T656] Code: ff ff 76 df 89 c6 f7 de 64 41 89 32 eb d5 89 c6 f7 de 64 41 89 32 eb db 66 2e 0f 1f 84 00 00 00 00 00 90 b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 40 ab 2e 00 f7 d8 64 89 02
[  T656] RSP: 002b:00007fffffffe2c8 EFLAGS: 00000217 ORIG_RAX: 000000000000003b
[  T656] RAX: ffffffffffffffda RBX: 000055555581c590 RCX: 00007ffff6336317
[  T656] RDX: 000055555581e090 RSI: 0000555555823f20 RDI: 000055555581e050
[  T656] RBP: 000055555581c588 R08: 0000000000000007 R09: 0000000000000008
[  T656] R10: 00007fffffffde01 R11: 0000000000000217 R12: 0000000000000000
[  T656] R13: 0000000000000004 R14: 000055555581c480 R15: 0000555555815e80
[  T656] Modules linked in: msg_socket(O)
[  T656] CR2: ffff88805a320000
[  T656] ---[ end trace f72d8855a9e1315f ]---
[  T656] RIP: 0010:clear_page_orig+0x12/0x40
[  T656] Code: 00 b8 01 00 00 00 5b c3 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[  T656] RSP: 0018:ffffc900006cb988 EFLAGS: 00000216
[  T656] RAX: 0000000000000000 RBX: dead000000000100 RCX: 000000000000003f
[  T656] RDX: ffff8880596fc2c0 RSI: 00000000013893d8 RDI: ffff8880594ed000
[  T656] RBP: ffffc900006cbb00 R08: 0000000000000000 R09: 0000000001389410
[  T656] R10: ffff888000000000 R11: 6db6db6db6db6db7 R12: 0000000000000010
[  T656] R13: ffffffff81cd9f40 R14: ffffea00013893d8 R15: ffffea00013893d8
[  T656] FS:  00007ffff7fe5800(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000
[  T656] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  T656] CR2: ffff88805a320000 CR3: 000000005a28a000 CR4: 00000000000006f0
[  T656] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
[  T656] in_atomic(): 1, irqs_disabled(): 1, pid: 656, name: sshd
[  T656] INFO: lockdep is turned off.
[  T656] irq event stamp: 0
[  T656] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  T656] hardirqs last disabled at (0): [<ffffffff8105f078>] copy_process.part.9+0x4d8/0x1bf0
[  T656] softirqs last  enabled at (0): [<ffffffff8105f078>] copy_process.part.9+0x4d8/0x1bf0
[  T656] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  T656] CPU: 0 PID: 656 Comm: sshd Tainted: G      D W  O      5.2.0-rc4-popcorn+ #32
[  T656] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  T656] Call Trace:
[  T656]  dump_stack+0x67/0x9b
[  T656]  ___might_sleep+0x149/0x230
[  T656]  exit_signals+0x30/0x240
[  T656]  ? __x64_sys_execve+0x26/0x30
[  T656]  do_exit+0xb0/0xc30
[  T656]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  T656]  rewind_stack_do_exit+0x17/0x20
[  T656] note: sshd[656] exited with preempt_count 1
[  T657] BUG: unable to handle page fault for address: ffff88805a2fd000
[  T657] #PF: supervisor write access in kernel mode
[  T657] #PF: error_code(0x000b) - reserved bit violation
[  T657] PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 5a17d063 PTE 800fffffa5d02063
[  T657] Oops: 000b [#3] SMP NOPTI
[  T657] CPU: 0 PID: 657 Comm: sshd Tainted: G      D W  O      5.2.0-rc4-popcorn+ #32
[  T657] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  T657] RIP: 0010:clear_page_orig+0x12/0x40
[  T657] Code: 00 b8 01 00 00 00 5b c3 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[  T657] RSP: 0018:ffffc900005ff9d8 EFLAGS: 00000216
[  T657] RAX: 0000000000000000 RBX: dead000000000100 RCX: 000000000000003f
[  T657] RDX: ffff888059676300 RSI: 00000000013ba758 RDI: ffff88805a2fd000
[  T657] RBP: ffffc900005ffb50 R08: 0000000000000000 R09: 00000000013ba790
[  T657] R10: ffff888000000000 R11: 6db6db6db6db6db7 R12: 0000000000000010
[  T657] R13: ffffffff81cd9f40 R14: ffffea00013ba758 R15: ffffea00013ba758
[  T657] FS:  00007ffff7fe5800(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000
[  T657] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  T657] CR2: ffff88805a2fd000 CR3: 000000005a28a000 CR4: 00000000000006f0
[  T657] Call Trace:
[  T657]  get_page_from_freelist+0x7dc/0x13d0
[  T657]  ? lock_acquire+0xa6/0x1a0
[  T657]  ? fs_reclaim_acquire.part.26+0x5/0x30
[  T657]  __alloc_pages_nodemask+0x178/0xfa0
[  T657]  ? lock_acquire+0xa6/0x1a0
[  T657]  ? find_get_entry+0x5/0x300
[  T657]  __get_free_pages+0x11/0x50
[  T657]  __pud_alloc+0x2a/0xc0
[  T657]  __handle_mm_fault+0x2b7/0xcc0
[  T657]  __get_user_pages+0x215/0x790
[  T657]  get_user_pages_remote+0x158/0x210
[  T657]  copy_strings+0x16b/0x2e0
[  T657]  ? kernel_read+0x2c/0x40
[  T657]  copy_strings_kernel+0x2c/0x40
[  T657]  __do_execve_file+0x6c2/0xa60
[  T657]  __x64_sys_execve+0x26/0x30
[  T657]  do_syscall_64+0x69/0x440
[  T657]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  T657]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  T657] RIP: 0033:0x7ffff6336317
[  T657] Code: ff ff 76 df 89 c6 f7 de 64 41 89 32 eb d5 89 c6 f7 de 64 41 89 32 eb db 66 2e 0f 1f 84 00 00 00 00 00 90 b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 40 ab 2e 00 f7 d8 64 89 02
[  T657] RSP: 002b:00007fffffffe2c8 EFLAGS: 00000217 ORIG_RAX: 000000000000003b
[  T657] RAX: ffffffffffffffda RBX: 000055555581c590 RCX: 00007ffff6336317
[  T657] RDX: 000055555581e090 RSI: 0000555555823f20 RDI: 000055555581e050
[  T657] RBP: 000055555581c588 R08: 0000000000000007 R09: 0000000000000008
[  T657] R10: 00007fffffffde01 R11: 0000000000000217 R12: 0000000000000000
[  T657] R13: 0000000000000004 R14: 000055555581c480 R15: 0000555555815e80
[  T657] Modules linked in: msg_socket(O)
[  T657] CR2: ffff88805a2fd000
[  T657] ---[ end trace f72d8855a9e13160 ]---
[  T657] RIP: 0010:clear_page_orig+0x12/0x40
[  T657] Code: 00 b8 01 00 00 00 5b c3 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[  T657] RSP: 0018:ffffc900006cb988 EFLAGS: 00000216
[  T657] RAX: 0000000000000000 RBX: dead000000000100 RCX: 000000000000003f
[  T657] RDX: ffff8880596fc2c0 RSI: 00000000013893d8 RDI: ffff8880594ed000
[  T657] RBP: ffffc900006cbb00 R08: 0000000000000000 R09: 0000000001389410
[  T657] R10: ffff888000000000 R11: 6db6db6db6db6db7 R12: 0000000000000010
[  T657] R13: ffffffff81cd9f40 R14: ffffea00013893d8 R15: ffffea00013893d8
[  T657] FS:  00007ffff7fe5800(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000
[  T657] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  T657] CR2: ffff88805a2fd000 CR3: 000000005a28a000 CR4: 00000000000006f0
[  T657] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
[  T657] in_atomic(): 1, irqs_disabled(): 1, pid: 657, name: sshd
[  T657] INFO: lockdep is turned off.
[  T657] irq event stamp: 0
[  T657] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  T657] hardirqs last disabled at (0): [<ffffffff8105f078>] copy_process.part.9+0x4d8/0x1bf0
[  T657] softirqs last  enabled at (0): [<ffffffff8105f078>] copy_process.part.9+0x4d8/0x1bf0
[  T657] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  T657] CPU: 0 PID: 657 Comm: sshd Tainted: G      D W  O      5.2.0-rc4-popcorn+ #32
[  T657] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  T657] Call Trace:
[  T657]  dump_stack+0x67/0x9b
[  T657]  ___might_sleep+0x149/0x230
[  T657]  exit_signals+0x30/0x240
[  T657]  ? __x64_sys_execve+0x26/0x30
[  T657]  do_exit+0xb0/0xc30
[  T657]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  T657]  rewind_stack_do_exit+0x17/0x20
[  T657] note: sshd[657] exited with preempt_count 1
cesarjp commented 4 years ago

That get_page_from_freelist error looks awfully familiar to the faults I encountered with the various PTE workarounds for the L1TF patches. I believe this issue will go away with the patch I posted in Issue #55.

AHatnarf commented 4 years ago

That get_page_from_freelist error looks awfully familiar to the faults I encountered with the various PTE workarounds for the L1TF patches. I believe this issue will go away with the patch I posted in Issue #55.

I can confirm that the second patch did work. Thanks! Interestingly, on some systems which support the L1TF workaround, adding boot arguments to disable the mitigations didn't disable it.

I'll close these related issues so we can concatenate them into one issue (#84, #85, #89, #91 are fixed by reverting the patches). The Linux community probably wouldn't approve of us reverting the L1TF patches, I'll start looking into solutions might be more appealing to them.