[kfence] Spurious KASAN failures with syzkaller

melver commented 4 years ago

With syzkaller-kfence.config.txt (syzkaller upstream-kasan.config + KFENCE with SAMPLE_INTERVAL=10 NUM_OBJECTS=1000) and running on syzkaller I observe strange KASAN failures.

==================================================================
BUG: KASAN: out-of-bounds in memset include/linux/string.h:391 [inline]
BUG: KASAN: out-of-bounds in memzero_explicit include/linux/string.h:250 [inline]
BUG: KASAN: out-of-bounds in kfence_guarded_alloc mm/kfence/core.c:292 [inline]
BUG: KASAN: out-of-bounds in __kfence_alloc+0xa10/0xbd0 mm/kfence/core.c:622
Write of size 4096 at addr ffffffff8d76c000 by task systemd-udevd/4597
CPU: 1 PID: 4597 Comm: systemd-udevd Not tainted 5.9.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 check_memory_region_inline mm/kasan/generic.c:186 [inline]
 check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
 memset+0x20/0x40 mm/kasan/common.c:85
 memset include/linux/string.h:391 [inline]
 memzero_explicit include/linux/string.h:250 [inline]
 kfence_guarded_alloc mm/kfence/core.c:292 [inline]
 __kfence_alloc+0xa10/0xbd0 mm/kfence/core.c:622
 kfence_alloc include/linux/kfence.h:88 [inline]
 slab_alloc mm/slab.c:3308 [inline]
 kmem_cache_alloc+0x8b/0x410 mm/slab.c:3498
 getname_flags.part.0+0x50/0x4f0 fs/namei.c:138
 getname_flags+0x9a/0xe0 include/linux/audit.h:320
 getname fs/namei.c:209 [inline]
 do_renameat2+0x15c/0xbf0 fs/namei.c:4374
 __do_sys_rename fs/namei.c:4502 [inline]
 __se_sys_rename fs/namei.c:4500 [inline]
 __x64_sys_rename+0x5d/0x80 fs/namei.c:4500
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f5d8bdd7d47
Code: 75 12 48 89 df e8 19 84 07 00 85 c0 0f 95 c0 0f b6 c0 f7 d8 5b c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 52 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 41 33 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcd5417588 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
RAX: ffffffffffffffda RBX: 000055d93b2908f0 RCX: 00007f5d8bdd7d47
RDX: 0000000000000000 RSI: 00007ffcd5417590 RDI: 000055d93b2915c0
RBP: 00007ffcd5417650 R08: 000055d93b291210 R09: 000055d93b291130
R10: 00007f5d8cfda8c0 R11: 0000000000000246 R12: 00007ffcd5417590
R13: 0000000000000001 R14: 000055d93a77a6cb R15: 0000000000000000
The buggy address belongs to the variable:
 __kfence_pool+0x1ee000/0x7d2000
Memory state around the buggy address:
 ffffffff8d76bf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffff8d76bf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffff8d76c000: 00 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00
                               ^
 ffffffff8d76c080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffff8d76c100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

or

==================================================================
BUG: KASAN: out-of-bounds in check_canary_byte mm/kfence/core.c:194 [inline]
BUG: KASAN: out-of-bounds in for_each_canary mm/kfence/core.c:215 [inline]
BUG: KASAN: out-of-bounds in kfence_guarded_free+0x74d/0x830 mm/kfence/core.c:335
Read of size 1 at addr ffffffff8dd3effc by task syz-executor.5/11879

CPU: 1 PID: 11879 Comm: syz-executor.5 Not tainted 5.9.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 check_canary_byte mm/kfence/core.c:194 [inline]
 for_each_canary mm/kfence/core.c:215 [inline]
 kfence_guarded_free+0x74d/0x830 mm/kfence/core.c:335
 __kfence_free+0x95/0x150 mm/kfence/core.c:659
 kfence_free include/linux/kfence.h:143 [inline]
 __cache_free mm/slab.c:3428 [inline]
 kfree+0x22e/0x2e0 mm/slab.c:3772
 __sock_kfree_s net/core/sock.c:2268 [inline]
 sock_kzfree_s+0x24/0x60 net/core/sock.c:2282
 hash_free_result crypto/algif_hash.c:59 [inline]
 hash_recvmsg+0x435/0xa70 crypto/algif_hash.c:224
 sock_recvmsg_nosec net/socket.c:885 [inline]
 ____sys_recvmsg+0x561/0x640 net/socket.c:2574
 ___sys_recvmsg+0x127/0x200 net/socket.c:2618
 do_recvmmsg+0x24d/0x6d0 net/socket.c:2716
 __sys_recvmmsg net/socket.c:2795 [inline]
 __do_sys_recvmmsg net/socket.c:2818 [inline]
 __se_sys_recvmmsg net/socket.c:2811 [inline]
 __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2811
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4648a9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fcce1eb61a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 000000000055bf00 RCX: 00000000004648a9
RDX: 0000000000000500 RSI: 0000000020007e00 RDI: 0000000000000005
RBP: 00000000004ae655 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000055bf0c
R13: 00007fff7147aebf R14: 000000000055bf00 R15: 0000000000022000

The buggy address belongs to the variable:
 __kfence_pool+0x7c0ffc/0x7d2000

Memory state around the buggy address:
 ffffffff8dd3ee80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffff8dd3ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffff8dd3ef80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04
                                                                ^
 ffffffff8dd3f000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffff8dd3f080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

dvyukov commented 4 years ago

04 in the shadow looks like poisoning of the last incomplete granule of a heap, which we later forgot to clear. Is it possible we have some minor size mismatches when we allocate/deallocate and pass size to kasan hooks?

melver commented 4 years ago

KASAN should ignore all KFENCE memory; we have checks in KASAN's slab alloc/free hooks to just return if it's a KFENCE address.

dvyukov commented 4 years ago

OK, then it's weird. 04 also sometimes appear as left-over stack shadow, if we somehow fail to clean it on function exit. But shadow for kfence pool should not be used as shadow for stacks... Now sure who/how else can corrupt kasan shadow... We of course have lots of memory corruption bugs, but this looks non-typical for a random memory corruption. It is always single 04? Or other patterns as well?

melver commented 4 years ago

I've also seen 0x01 and 0x02, i.e. always just 1 bit set / power-of-2 size which might be expected:

==================================================================
BUG: KASAN: out-of-bounds in check_canary_byte mm/kfence/core.c:194 [inline]
BUG: KASAN: out-of-bounds in for_each_canary mm/kfence/core.c:215 [inline]
BUG: KASAN: out-of-bounds in kfence_guarded_free+0x74d/0x830 mm/kfence/core.c:335
Read of size 1 at addr ffffffff8dc44ff9 by task kworker/u4:4/883

CPU: 0 PID: 883 Comm: kworker/u4:4 Not tainted 5.9.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Workqueue: bat_events batadv_nc_worker
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 check_canary_byte mm/kfence/core.c:194 [inline]
 for_each_canary mm/kfence/core.c:215 [inline]
 kfence_guarded_free+0x74d/0x830 mm/kfence/core.c:335
 __kfence_free+0x95/0x150 mm/kfence/core.c:659
 kfence_free include/linux/kfence.h:143 [inline]
 __cache_free mm/slab.c:3428 [inline]
 kfree+0x22e/0x2e0 mm/slab.c:3772
 rcu_do_batch kernel/rcu/tree.c:2428 [inline]
 rcu_core+0x5ca/0x1130 kernel/rcu/tree.c:2656
 __do_softirq+0x1f8/0xb23 kernel/softirq.c:298
 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
 </IRQ>
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
 do_softirq_own_stack+0x9d/0xd0 arch/x86/kernel/irq_64.c:77
 invoke_softirq kernel/softirq.c:393 [inline]
 __irq_exit_rcu kernel/softirq.c:423 [inline]
 irq_exit_rcu+0x235/0x280 kernel/softirq.c:435
 sysvec_apic_timer_interrupt+0x51/0xf0 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:581
RIP: 0010:lock_release+0xc7/0x8f0 kernel/locking/lockdep.c:5027
Code: 5a 00 48 0f a3 1d a1 98 fd 09 0f 82 55 05 00 00 65 48 8b 1c 25 c0 fe 01 00 48 8d bb e4 08 00 00 48 b8 00 00 00 00 00 fc ff df <48> 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0
RSP: 0018:ffffc9000343fbb8 EFLAGS: 00000286
RAX: dffffc0000000000 RBX: ffff8880681e4480 RCX: ffffffff815bf11f
RDX: dffffc0000000000 RSI: ffffffff8a067e80 RDI: ffff8880681e4d64
RBP: 1ffff92000687f79 R08: 0000000000000000 R09: ffffffff8b5989cf
R10: fffffbfff16b3139 R11: 0000000000000000 R12: 0000000000000001
R13: ffffffff8a067f40 R14: ffffffff880a3d21 R15: 00000000000001d2
 rcu_lock_release include/linux/rcupdate.h:246 [inline]
 rcu_read_unlock include/linux/rcupdate.h:688 [inline]
 batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:411 [inline]
 batadv_nc_worker+0x7a3/0xe50 net/batman-adv/network-coding.c:718
 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
 kthread+0x3b5/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

The buggy address belongs to the variable:
 __kfence_pool+0x6c6ff9/0x7d2000

Memory state around the buggy address:
 ffffffff8dc44e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffff8dc44f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffff8dc44f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
                                                                ^
 ffffffff8dc45000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffff8dc45080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

melver commented 4 years ago

Hmm it might be due to ksize() unpoisoning half a word only: mm/slab_common.c:1183, in which case we need to add a check in kasan_unpoison_shadow for KFENCE memory as well.

melver commented 4 years ago

Hmm it might be due to ksize() unpoisoning half a word only: mm/slab_common.c:1183, in which case we need to add a check in kasan_unpoison_shadow for KFENCE memory as well.

Looks like this is it: https://github.com/google/kasan/pull/154 -- no more crashes.

google / kernel-sanitizers

[kfence] Spurious KASAN failures with syzkaller #153