ssrg-vt / popcorn-kernel

Popcorn Linux kernel for distributed thread execution
Other
156 stars 23 forks source link

arm64 race condition/deadlock #71

Closed bxatnarf closed 5 years ago

bxatnarf commented 5 years ago

Architecture: arm64 (it's very possible this occurs on x86 but I haven't had a chance to test it yet) Branch: merge Triggering binary: basic.c from popcorn-kernel-lib

Deadlock often occurs when basic is executed on an arm64 machine. The following set of events that can trigger this deadlock are:

  1. Process' thread is migrated for the first time to a remote host
  2. On remote host post-migration, the process quickly exits
bxatnarf commented 5 years ago

This bug seems a bit harder to trigger since commit bf0c818be9a0885Q02956fc058c99a4783156d376

bxatnarf commented 5 years ago

Here is a backtrace of the remote host when it is in deadlock

#0  0xffff0000082971a0 in uncharge_list (page_list=<optimized out>) at mm/memcontrol.c:6099
#1  mem_cgroup_uncharge_list (page_list=0xffff000009f03b58) at mm/memcontrol.c:6142
#2  0xffff0000082425a4 in release_pages (pages=0xffff7e0003dae240, nr=166739384) at mm/swap.c:790
#3  0xffff00000827412c in tlb_flush_mmu_free (tlb=0xffff000009f03d40) at mm/mmu_gather.c:74
#4  0xffff0000082681c8 in zap_pte_range (details=<optimized out>, end=<optimized out>, addr=18446462598899580224, pmd=<optimized out>, vma=<optimized out>, tlb=<optimized out>)
    at mm/memory.c:1166
#5  zap_pmd_range (details=<optimized out>, end=<optimized out>, addr=18446462598899580224, pud=<optimized out>, vma=<optimized out>, tlb=<optimized out>) at mm/memory.c:1201
#6  zap_pud_range (details=<optimized out>, end=<optimized out>, addr=18446462598899580224, p4d=<optimized out>, vma=<optimized out>, tlb=<optimized out>) at mm/memory.c:1230
#7  zap_p4d_range (details=<optimized out>, end=<optimized out>, addr=<optimized out>, pgd=<optimized out>, vma=<optimized out>, tlb=<optimized out>) at mm/memory.c:1251
#8  unmap_page_range (tlb=0xffff000009f03d40, vma=0xffff8000f937c6c0, addr=281472850198528, end=<optimized out>, details=<optimized out>) at mm/memory.c:1272
#9  0xffff00000826850c in unmap_single_vma (tlb=0xffff000009f03d40, vma=0xffff8000f937c6c0, start_addr=<optimized out>, end_addr=<optimized out>, details=0x0) at mm/memory.c:1317
#10 0xffff0000082686e0 in unmap_vmas (tlb=0xffff000009f03d40, vma=0xffff8000f937c6c0, start_addr=0, end_addr=18446744073709551615) at mm/memory.c:1347
#11 0xffff000008271800 in exit_mmap (mm=0xffff8000f71e3700) at mm/mmap.c:3198
#12 0xffff0000080a490c in __mmput (mm=<optimized out>) at kernel/fork.c:1095
#13 mmput (mm=0xffff8000f71e3700) at kernel/fork.c:1120
#14 0xffff0000080ad608 in exit_mm () at kernel/exit.c:554
#15 do_exit (code=<optimized out>) at kernel/exit.c:863
#16 0xffff0000080d3940 in kthread (_create=0xffff8000f71d8100) at kernel/kthread.c:248
#17 0xffff00000808605c in ret_from_fork () at arch/arm64/kernel/entry.S:1086
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

A loop in page->lru list seems to be causing this issue

bxatnarf commented 5 years ago

I'm closing this issue in favor of the more descriptive issue https://github.com/ssrg-vt/popcorn-kernel/issues/80