Closed melver closed 4 years ago
04
in the shadow looks like poisoning of the last incomplete granule of a heap, which we later forgot to clear. Is it possible we have some minor size mismatches when we allocate/deallocate and pass size to kasan hooks?
KASAN should ignore all KFENCE memory; we have checks in KASAN's slab alloc/free hooks to just return if it's a KFENCE address.
OK, then it's weird. 04
also sometimes appear as left-over stack shadow, if we somehow fail to clean it on function exit. But shadow for kfence pool should not be used as shadow for stacks...
Now sure who/how else can corrupt kasan shadow... We of course have lots of memory corruption bugs, but this looks non-typical for a random memory corruption.
It is always single 04
? Or other patterns as well?
I've also seen 0x01 and 0x02, i.e. always just 1 bit set / power-of-2 size which might be expected:
==================================================================
BUG: KASAN: out-of-bounds in check_canary_byte mm/kfence/core.c:194 [inline]
BUG: KASAN: out-of-bounds in for_each_canary mm/kfence/core.c:215 [inline]
BUG: KASAN: out-of-bounds in kfence_guarded_free+0x74d/0x830 mm/kfence/core.c:335
Read of size 1 at addr ffffffff8dc44ff9 by task kworker/u4:4/883
CPU: 0 PID: 883 Comm: kworker/u4:4 Not tainted 5.9.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Workqueue: bat_events batadv_nc_worker
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
check_canary_byte mm/kfence/core.c:194 [inline]
for_each_canary mm/kfence/core.c:215 [inline]
kfence_guarded_free+0x74d/0x830 mm/kfence/core.c:335
__kfence_free+0x95/0x150 mm/kfence/core.c:659
kfence_free include/linux/kfence.h:143 [inline]
__cache_free mm/slab.c:3428 [inline]
kfree+0x22e/0x2e0 mm/slab.c:3772
rcu_do_batch kernel/rcu/tree.c:2428 [inline]
rcu_core+0x5ca/0x1130 kernel/rcu/tree.c:2656
__do_softirq+0x1f8/0xb23 kernel/softirq.c:298
asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
do_softirq_own_stack+0x9d/0xd0 arch/x86/kernel/irq_64.c:77
invoke_softirq kernel/softirq.c:393 [inline]
__irq_exit_rcu kernel/softirq.c:423 [inline]
irq_exit_rcu+0x235/0x280 kernel/softirq.c:435
sysvec_apic_timer_interrupt+0x51/0xf0 arch/x86/kernel/apic/apic.c:1091
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:581
RIP: 0010:lock_release+0xc7/0x8f0 kernel/locking/lockdep.c:5027
Code: 5a 00 48 0f a3 1d a1 98 fd 09 0f 82 55 05 00 00 65 48 8b 1c 25 c0 fe 01 00 48 8d bb e4 08 00 00 48 b8 00 00 00 00 00 fc ff df <48> 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0
RSP: 0018:ffffc9000343fbb8 EFLAGS: 00000286
RAX: dffffc0000000000 RBX: ffff8880681e4480 RCX: ffffffff815bf11f
RDX: dffffc0000000000 RSI: ffffffff8a067e80 RDI: ffff8880681e4d64
RBP: 1ffff92000687f79 R08: 0000000000000000 R09: ffffffff8b5989cf
R10: fffffbfff16b3139 R11: 0000000000000000 R12: 0000000000000001
R13: ffffffff8a067f40 R14: ffffffff880a3d21 R15: 00000000000001d2
rcu_lock_release include/linux/rcupdate.h:246 [inline]
rcu_read_unlock include/linux/rcupdate.h:688 [inline]
batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:411 [inline]
batadv_nc_worker+0x7a3/0xe50 net/batman-adv/network-coding.c:718
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
The buggy address belongs to the variable:
__kfence_pool+0x6c6ff9/0x7d2000
Memory state around the buggy address:
ffffffff8dc44e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffff8dc44f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffff8dc44f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
^
ffffffff8dc45000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffff8dc45080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Hmm it might be due to ksize() unpoisoning half a word only: mm/slab_common.c:1183, in which case we need to add a check in kasan_unpoison_shadow for KFENCE memory as well.
Hmm it might be due to ksize() unpoisoning half a word only: mm/slab_common.c:1183, in which case we need to add a check in kasan_unpoison_shadow for KFENCE memory as well.
Looks like this is it: https://github.com/google/kasan/pull/154 -- no more crashes.
With syzkaller-kfence.config.txt (syzkaller upstream-kasan.config + KFENCE with SAMPLE_INTERVAL=10 NUM_OBJECTS=1000) and running on syzkaller I observe strange KASAN failures.
or