ssrg-vt / popcorn-kernel

Popcorn Linux kernel for distributed thread execution
Other
156 stars 22 forks source link

occassional deadlock on remote due to loop in lru list #80

Closed bxatnarf closed 4 years ago

bxatnarf commented 5 years ago

Arch: migration between x86-64 hosts and migration between arm64 hosts Branch: merge, commit 4b4f4339d8513b0f1dcda540da4e9a7fc28c433e (latest as of the creation of this bug report) Example tested: bt You may have to run it multiple times to trigger deadlock. In the directory from which you execute bt you must create a file called inputbt.data that contains:

10       number of time steps
0.01e0   dt for class A = 0.0008d0. class B = 0.0003d0  class C = 0.0001d0
2 2 2

You can grab prebuilt bt binaries for arm and x86. If you want to build your own you will first have to rebuild the migration library (popcorn-kernel-lib) with this patch.

Kernel config:

CONFIG_ARCH_SUPPORTS_POPCORN=y
CONFIG_POPCORN=y
CONFIG_POPCORN_DEBUG=y
# CONFIG_POPCORN_DEBUG_PROCESS_SERVER is not set
# CONFIG_POPCORN_DEBUG_PAGE_SERVER is not set
# CONFIG_POPCORN_DEBUG_VMA_SERVER is not set
# CONFIG_POPCORN_DEBUG_VERBOSE is not set
# CONFIG_POPCORN_CHECK_SANITY is not set
# CONFIG_POPCORN_REMOTE_INFO is not set
# CONFIG_POPCORN_STAT is not set
CONFIG_POPCORN_KMSG=y
CONFIG_POPCORN_KMSG_SOCKET=m
# CONFIG_POPCORN_KMSG_TEST is not set
# CONFIG_POPCORN_DEBUG_MSG_LAYER is not set

kernel log on remote (x86):

[  177.230674] Modules linked in: msg_socket
[  177.230674] irq event stamp: 19050
[  177.230880] hardirqs last  enabled at (19049): [<ffffffff810019bd>] trace_hardirqs_on_c
[  177.231107] hardirqs last disabled at (19050): [<ffffffff810019d9>] trace_hardirqs_offc
[  177.231107] softirqs last  enabled at (10692): [<ffffffff81800341>] __do_softirq+0x341c
[  177.231107] softirqs last disabled at (10683): [<ffffffff810651d6>] irq_exit+0xa6/0xe0
[  177.231107] CPU: 0 PID: 646 Comm: bt Not tainted 4.20.0-rc7-popcorn+ #126
[  177.231107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181124
[  177.231107] RIP: 0010:lock_is_held_type+0x62/0x80
[  177.231107] Code: 25 80 4d 01 00 41 c7 84 24 a4 07 00 00 01 00 00 00 89 ee 48 89 df e8e
[  177.231107] RSP: 0018:ffffc900001dfc00 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  177.231107] RAX: 0000000000000000 RBX: ffffffff81c4dea0 RCX: 0000000000000001
[  177.231107] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffffffff81c4dea0
[  177.231107] RBP: 00000000ffffffff R08: 000000000000001c R09: 0000000000000001
[  177.231107] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881379d4300
[  177.231107] R13: 0000000000000246 R14: ffffc900001dfc90 R15: ffffea00043c8fc8
[  177.231107] FS:  0000000000000000(0000) GS:ffff88813fa00000(0000) knlGS:000000000000000
[  177.231107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  177.231107] CR2: 00007ffff7dfffe8 CR3: 0000000001c12000 CR4: 00000000000006f0
[  177.231107] Call Trace:
[  177.231107]  rcu_read_lock_sched_held+0x6c/0x80
[  177.231107]  free_pcp_prepare+0xff/0x130
[  177.231107]  free_unref_page_list+0x7a/0x240
[  177.231107]  release_pages+0x296/0x2e0
[  177.231107]  tlb_flush_mmu_free+0x31/0x50
[  177.231107]  unmap_page_range+0x62e/0x830
[  177.231107]  ? uprobe_clear_state+0x1f/0x80
[  177.231107]  unmap_vmas+0x3c/0x50
[  177.231107]  exit_mmap+0x91/0x160
[  177.231107]  mmput+0x41/0xf0
[  177.231107]  do_exit+0x38b/0xb50
[  177.231107]  ? remote_worker_main+0x3b8/0x590
[  177.231107]  ? handle_remote_futex_response+0x30/0x30
[  177.231107]  kthread+0xed/0x140
[  177.231107]  ? kthread_park+0x80/0x80
[  177.231107]  ret_from_fork+0x3a/0x50

x86 backtrace on remote during deadlock:

#0  lock_is_held_type (lock=0xffffffff81c4dea0 <rcu_sched_lock_map>, read=-1) at kernel/locking/lockdep.c:3886
#1  0xffffffff810cfc0c in lock_is_held (lock=<optimized out>) at ./include/linux/lockdep.h:339
#2  rcu_read_lock_sched_held () at kernel/rcu/update.c:117
#3  0xffffffff81190faf in trace_mm_page_free (order=<optimized out>, page=<optimized out>) at ./arch/x86/include/asm/preempt.h:81
#4  free_pages_prepare (check_free=<optimized out>, order=<optimized out>, page=<optimized out>) at mm/page_alloc.c:994
#5  free_pcp_prepare (page=0xffffea00043c8dd0) at mm/page_alloc.c:1058
#6  0xffffffff811931ba in free_unref_page_prepare (pfn=<optimized out>, page=<optimized out>) at mm/page_alloc.c:2812
#7  free_unref_page_list (list=0xffffc900001dfc90) at mm/page_alloc.c:2812
#8  0xffffffff8119bc86 in release_pages (pages=<optimized out>, nr=<optimized out>) at mm/swap.c:791
#9  0xffffffff811c24d1 in tlb_flush_mmu_free (tlb=0xffffc900001dfdf8) at mm/mmu_gather.c:74
#10 0xffffffff811b7bae in zap_pte_range (details=<optimized out>, end=<optimized out>, addr=18446683600571989496, pmd=<optimized out>, vma=<optimized out>, tlb=<optimized out>)
    at mm/memory.c:1166
#11 zap_pmd_range (details=<optimized out>, end=<optimized out>, addr=18446683600571989496, pud=<optimized out>, vma=<optimized out>, tlb=<optimized out>) at mm/memory.c:1201
#12 zap_pud_range (details=<optimized out>, end=<optimized out>, addr=18446683600571989496, p4d=<optimized out>, vma=<optimized out>, tlb=<optimized out>) at mm/memory.c:1230
#13 zap_p4d_range (details=<optimized out>, end=<optimized out>, addr=<optimized out>, pgd=<optimized out>, vma=<optimized out>, tlb=<optimized out>) at mm/memory.c:1251
#14 unmap_page_range (tlb=0xffffc900001dfdf8, vma=<optimized out>, addr=140737352040448, end=<optimized out>, details=<optimized out>) at mm/memory.c:1272
#15 0xffffffff811b7e2a in unmap_single_vma (tlb=<optimized out>, vma=<optimized out>, start_addr=<optimized out>, end_addr=<optimized out>, details=<optimized out>)
    at mm/memory.c:1317
#16 0xffffffff811b7f9c in unmap_vmas (tlb=0xffffc900001dfdf8, vma=0xffff888139868730, start_addr=0, end_addr=18446744073709551615) at mm/memory.c:1347
#17 0xffffffff811c0031 in exit_mmap (mm=<optimized out>) at mm/mmap.c:3198
#18 0xffffffff8105bf01 in __mmput (mm=<optimized out>) at kernel/fork.c:1095
#19 mmput (mm=<optimized out>) at kernel/fork.c:1120
--Type <RET> for more, q to quit, c to continue without paging--
#20 mmput (mm=0xffff888139aea6c0) at kernel/fork.c:1115
#21 0xffffffff810626cb in exit_mm () at kernel/exit.c:554
#22 do_exit (code=<optimized out>) at kernel/exit.c:863
#23 0xffffffff81080b4d in kthread (_create=0xffff88813a33dd40) at kernel/kthread.c:248
#24 0xffffffff8160021a in ret_from_fork () at arch/x86/entry/entry_64.S:352
#25 0x0000000000000000 in ?? ()

There is no relevant messages in the kernel log on arm. arm remote's backtrace while stuck in deadlock

#0  0xffff0000082971a0 in uncharge_list (page_list=<optimized out>) at mm/memcontrol.c:6099
#1  mem_cgroup_uncharge_list (page_list=0xffff000009f03b58) at mm/memcontrol.c:6142
#2  0xffff0000082425a4 in release_pages (pages=0xffff7e0003dae240, nr=166739384) at mm/swap.c:790
#3  0xffff00000827412c in tlb_flush_mmu_free (tlb=0xffff000009f03d40) at mm/mmu_gather.c:74
#4  0xffff0000082681c8 in zap_pte_range (details=<optimized out>, end=<optimized out>, addr=18446462598899580224, pmd=<optimized out>, vma=<optimized out>, tlb=<optimized out>)
    at mm/memory.c:1166
#5  zap_pmd_range (details=<optimized out>, end=<optimized out>, addr=18446462598899580224, pud=<optimized out>, vma=<optimized out>, tlb=<optimized out>) at mm/memory.c:1201
#6  zap_pud_range (details=<optimized out>, end=<optimized out>, addr=18446462598899580224, p4d=<optimized out>, vma=<optimized out>, tlb=<optimized out>) at mm/memory.c:1230
#7  zap_p4d_range (details=<optimized out>, end=<optimized out>, addr=<optimized out>, pgd=<optimized out>, vma=<optimized out>, tlb=<optimized out>) at mm/memory.c:1251
#8  unmap_page_range (tlb=0xffff000009f03d40, vma=0xffff8000f937c6c0, addr=281472850198528, end=<optimized out>, details=<optimized out>) at mm/memory.c:1272
#9  0xffff00000826850c in unmap_single_vma (tlb=0xffff000009f03d40, vma=0xffff8000f937c6c0, start_addr=<optimized out>, end_addr=<optimized out>, details=0x0) at mm/memory.c:1317
#10 0xffff0000082686e0 in unmap_vmas (tlb=0xffff000009f03d40, vma=0xffff8000f937c6c0, start_addr=0, end_addr=18446744073709551615) at mm/memory.c:1347
#11 0xffff000008271800 in exit_mmap (mm=0xffff8000f71e3700) at mm/mmap.c:3198
#12 0xffff0000080a490c in __mmput (mm=<optimized out>) at kernel/fork.c:1095
#13 mmput (mm=0xffff8000f71e3700) at kernel/fork.c:1120
#14 0xffff0000080ad608 in exit_mm () at kernel/exit.c:554
#15 do_exit (code=<optimized out>) at kernel/exit.c:863
#16 0xffff0000080d3940 in kthread (_create=0xffff8000f71d8100) at kernel/kthread.c:248
#17 0xffff00000808605c in ret_from_fork () at arch/arm64/kernel/entry.S:1086
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

The deadlock is caused by there being a loop in a page->lru linked list causing it to never finish releasing pages on exit. I do not know the cause of this loop.

bxatnarf commented 5 years ago

If popcorn's page_server_zap_pte hook is commented out from zap_pte_range, (i.e., this line) then the remote host doesn't get into deadlock. Instead, the remote host encounters a bad pte during its call to unmap_page_range, and the following stack gets dumped

[   88.206060] addr:(____ptrval____) vm_flags:00100173 anon_vma:(____ptrval____) mapping:          (null) index:ffffdc32f
[   88.206630] file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
[   88.206936] CPU: 0 PID: 261 Comm: stack Tainted: G    B   W         4.20.0-rc7-popcorn+ #96
[   88.207170] Hardware name: linux,dummy-virt (DT)
[   88.207309] Call trace:
[   88.207403]  dump_backtrace+0x0/0x1c8
[   88.207527]  show_stack+0x24/0x30
[   88.207634]  dump_stack+0xbc/0xf4
[   88.207750]  print_bad_pte+0x18c/0x1e0
[   88.207886]  unmap_page_range+0x224/0x968
[   88.208016]  unmap_single_vma+0x8c/0xa0
[   88.208138]  unmap_vmas+0x60/0x78
[   88.208247]  exit_mmap+0xc8/0x170
[   88.208361]  mmput+0x74/0x118
[   88.208466]  do_exit+0x3e0/0xaf0
[   88.208578]  kthread+0xf0/0x138
[   88.208685]  ret_from_fork+0x10/0x1c
bxatnarf commented 5 years ago

According to Ho-Ren, this could also be an issue with get_normal_page/vm_normal_page which manipulates the lru cache. Given that get_normal_page has changed quite a bit since the vanilla popcorn version there is a real possibility here -- see this git blame of vm_normal_page that shows the changes since linux 4.4

AHatnarf commented 4 years ago

Going to close this after 1f12a34a25a122f6b0e512b0326b5b199daf215c. Thanks!