google / kernel-sanitizers

Linux Kernel Sanitizers, fast bug-detectors for the Linux kernel
https://google.github.io/kernel-sanitizers/
442 stars 87 forks source link

test_free_bulk() may crash with slub_debug=Z #129

Closed ramosian-glider closed 4 years ago

ramosian-glider commented 4 years ago

test_free_bulk() crashes flakily when ran with slub_debug=Z:

    ok 9 - test_free_bulk
    # test_free_bulk-memcache: setup_test_cache: size=211, ctor=0x0
    # test_free_bulk-memcache: test_alloc: size=211, gfp=cc0, policy=right, cache=1
    # test_free_bulk-memcache: test_alloc: size=211, gfp=cc0, policy=none, cache=1
    # test_free_bulk-memcache: test_alloc: size=211, gfp=cc0, policy=left, cache=1
    # test_free_bulk-memcache: test_alloc: size=211, gfp=cc0, policy=none, cache=1
    # test_free_bulk-memcache: test_alloc: size=211, gfp=cc0, policy=none, cache=1
    # test_free_bulk-memcache: setup_test_cache: size=55, ctor=ctor_set_x [kfence_test]
    # test_free_bulk-memcache: test_alloc: size=55, gfp=cc0, policy=right, cache=1
    # test_free_bulk-memcache: test_alloc: size=55, gfp=cc0, policy=none, cache=1
    # test_free_bulk-memcache: test_alloc: size=55, gfp=cc0, policy=left, cache=1
    # test_free_bulk-memcache: test_alloc: size=55, gfp=cc0, policy=none, cache=1
    # test_free_bulk-memcache: test_alloc: size=55, gfp=cc0, policy=none, cache=1
    # test_free_bulk-memcache: setup_test_cache: size=21, ctor=0x0
    # test_free_bulk-memcache: test_alloc: size=21, gfp=cc0, policy=right, cache=1
general protection fault, probably for non-canonical address 0xcccccc8111808180: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 353 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #986
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
RIP: 0010:get_freepointer mm/slub.c:279
RIP: 0010:deactivate_slab.isra.0+0x5a/0x4d0 mm/slub.c:2117
Code: 08 4c 8b 84 c7 e8 00 00 00 48 8b 46 20 31 f6 48 85 c0 40 0f 95 c6 83 c6 0f 89 74 24 18 48 85 d2 0f 84 16 01 00 00 41 8b 4e 20 <48> 8b 34 0b 48 85 f6 0f 84 fd 00 00 00 41 
8b 7e 08 f7 c7 00 01 00
RSP: 0018:ffff8881102ffae0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: cccccc8111808170 RCX: 0000000000000010
RDX: cccccc8111808170 RSI: 000000000000000f RDI: ffff8881164e27c0
RBP: ffff8881102ffb80 R08: ffff88810f2f4240 R09: ffff88811180801d
R10: 0000000000016013 R11: fffffbfff7195dd8 R12: ffffea0004460200
R13: cccccc8111808170 R14: ffff8881164e27c0 R15: ffffea0004460200
FS:  0000000000000000(0000) GS:ffff88811b500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc6ac049000 CR3: 0000000115220000 CR4: 0000000000740ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 ___slab_alloc.isra.0+0x526/0x570 mm/slub.c:2697
 __slab_alloc.isra.0+0x9/0x10 mm/slub.c:2722
 slab_alloc_node mm/slub.c:2801
 slab_alloc mm/slub.c:2846
 kmem_cache_alloc+0x1d4/0x200 mm/slub.c:2851
 test_alloc+0x1a6/0x2cb [kfence_test]
 test_free_bulk.cold+0x30/0x1b2 [kfence_test]
 kunit_run_case_internal lib/kunit/test.c:256
 kunit_try_run_case+0x6b/0xa0 lib/kunit/test.c:295
 kunit_generic_run_threadfn_adapter+0x24/0x40 lib/kunit/try-catch.c:28
 kthread+0x199/0x1f0 kernel/kthread.c:291
 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293
Modules linked in: kfence_test(+)
---[ end trace 991ff9b94ef14171 ]---
RIP: 0010:get_freepointer mm/slub.c:279
RIP: 0010:deactivate_slab.isra.0+0x5a/0x4d0 mm/slub.c:2117
Code: 08 4c 8b 84 c7 e8 00 00 00 48 8b 46 20 31 f6 48 85 c0 40 0f 95 c6 83 c6 0f 89 74 24 18 48 85 d2 0f 84 16 01 00 00 41 8b 4e 20 <48> 8b 34 0b 48 85 f6 0f 84 fd 00 00 00 41 
8b 7e 08 f7 c7 00 01 00
RSP: 0018:ffff8881102ffae0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: cccccc8111808170 RCX: 0000000000000010
RDX: cccccc8111808170 RSI: 000000000000000f RDI: ffff8881164e27c0
RBP: ffff8881102ffb80 R08: ffff88810f2f4240 R09: ffff88811180801d
R10: 0000000000016013 R11: fffffbfff7195dd8 R12: ffffea0004460200
R13: cccccc8111808170 R14: ffff8881164e27c0 R15: ffffea0004460200
FS:  0000000000000000(0000) GS:ffff88811b500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc6ac049000 CR3: 0000000115220000 CR4: 0000000000740ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
QEMU 4.2.0 monitor - type 'help' for more information
melver commented 4 years ago

As can be seen in stack trace, the fault is not in KFENCE. It appears that the redzone logic in SLUB may be buggy with certain sizes.

We can try to investigate, and if we find a reproducer away from KFENCE and kfence-test, we can send a report upstream.

melver commented 4 years ago

https://lkml.kernel.org/r/20200807160627.GA1420741@elver.google.com

melver commented 4 years ago

https://lkml.kernel.org/r/20201008233443.3335464-1-keescook@chromium.org