google / kernel-sanitizers

Linux Kernel Sanitizers, fast bug-detectors for the Linux kernel
https://google.github.io/kernel-sanitizers/
437 stars 87 forks source link

[kfence] Testing concurrency #71

Closed melver closed 4 years ago

melver commented 4 years ago

KFence has some non-trivial concurrency design, part of which is due to how freelists are organized by SL[AU]B.

Ensure that the tests exercise concurrent KFence accesses, and test with KCSAN.

melver commented 4 years ago

With lockdep we currently lock up, and GDB shows this trace:

#0  queued_spin_lock_slowpath (lock=0xffffffff84622720 <serial8250_ports>, val=<optimized out>) at kernel/locking/qspinlock.c:382                                                                                                                     [36/1827]
#1  0xffffffff8110ca2e in queued_spin_lock (lock=<optimized out>) at ./include/asm-generic/qspinlock.h:81                                                                                                                                                      
#2  do_raw_spin_lock (lock=0xffffffff84622720 <serial8250_ports>) at kernel/locking/spinlock_debug.c:113                                                                                                                                                       
#3  0xffffffff81e8dd3d in __raw_spin_lock_irqsave (lock=<optimized out>) at ./include/linux/spinlock_api_smp.h:117                                                                                                                                             
#4  _raw_spin_lock_irqsave (lock=0xffffffff84622720 <serial8250_ports>) at kernel/locking/spinlock.c:159                                                                                                                                                       
#5  0xffffffff8166c7a3 in serial8250_console_write (up=0xffffffff84622720 <serial8250_ports>, s=<optimized out>, count=16) at ./include/linux/spinlock.h:328                                                                  
#6  0xffffffff81119d0e in call_console_drivers (ext_text=<optimized out>, text=<optimized out>, len=<optimized out>, ext_len=<optimized out>) at kernel/printk/printk.c:1826                                                                                   
#7  console_unlock () at kernel/printk/printk.c:2503                                                                                                                                                                                                           
#8  0xffffffff8111b760 in console_unlock () at kernel/printk/printk.c:2404                                                                                                                                                                                     
#9  vprintk_emit (facility=<optimized out>, level=<optimized out>, dict=<optimized out>, dictlen=0, fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2031
#10 0xffffffff8111b946 in vprintk_default (fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2049                                                                                                                                           
#11 0xffffffff8111c869 in vprintk_func (fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk_safe.c:393                                                                                                                                          
#12 0xffffffff8111bf2c in printk (fmt=<optimized out>) at kernel/printk/printk.c:2080                                          
#13 0xffffffff81107b2e in print_circular_bug_header (entry=0xffffffff83cad600 <list_entries+4928>, depth=2, check_src=0xffff88881c710920, check_tgt=<optimized out>) at kernel/locking/lockdep.c:1644
#14 0xffffffff81100462 in print_circular_bug (this=0xffffc900000d8b80, target=0xffffffff83cad600 <list_entries+4928>, check_src=0xffff88881c710920, check_tgt=0xffff88881c7108f8) at kernel/locking/lockdep.c:1686
#15 0xffffffff81100662 in check_noncircular (src=0xffff88881c710920, target=0xffff88881c7108f8, trace=0xffffc900000d8c70) at kernel/locking/lockdep.c:1827
#16 0xffffffff811045ce in check_prev_add (trace=<optimized out>, distance=<optimized out>, next=<optimized out>, prev=<optimized out>, curr=<optimized out>) at kernel/locking/lockdep.c:2496
#17 check_prevs_add (next=<optimized out>, curr=<optimized out>) at kernel/locking/lockdep.c:2601                             
#18 validate_chain (chain_key=<optimized out>, chain_head=<optimized out>, hlock=<optimized out>, curr=<optimized out>) at kernel/locking/lockdep.c:3218
#19 __lock_acquire (lock=<optimized out>, subclass=<optimized out>, trylock=<optimized out>, read=<optimized out>, check=<optimized out>, hardirqs_off=<optimized out>, nest_lock=<optimized out>, ip=<optimized out>, references=0, 
    pin_count=<optimized out>) at kernel/locking/lockdep.c:4380                                                                                                                                                                                                
#20 0xffffffff811022e5 in lock_acquire (lock=0xffffffff8401ef48 <kfence_metadata+126760>, subclass=0, trylock=0, read=0, check=1, nest_lock=<optimized out>, ip=18446744071581520294) at ./arch/x86/include/asm/irqflags.h:164
#21 0xffffffff81e8dd35 in __raw_spin_lock_irqsave (lock=<optimized out>) at ./include/linux/spinlock_api_smp.h:110                                                                                                                                             
#22 _raw_spin_lock_irqsave (lock=0xffffffff8401ef30 <kfence_metadata+126736>) at kernel/locking/spinlock.c:159                                                                                                                                                 
#23 0xffffffff8128d1a6 in kfence_guarded_alloc (gfp=<optimized out>, size=<optimized out>, cache=<optimized out>) at mm/kfence/core.c:221
#24 __kfence_alloc (s=0xffff88881c407180, size=544, flags=2592) at mm/kfence/core.c:512                                   
#25 0xffffffff81288692 in kfence_alloc (flags=<optimized out>, size=<optimized out>, s=<optimized out>) at ./include/linux/kfence.h:38    
#26 slab_alloc_node (orig_size=<optimized out>, addr=<optimized out>, node=<optimized out>, gfpflags=<optimized out>, s=<optimized out>) at mm/slub.c:2764
#27 slab_alloc (orig_size=<optimized out>, addr=<optimized out>, gfpflags=<optimized out>, s=<optimized out>) at mm/slub.c:2850
#28 __kmalloc (size=544, flags=2592) at mm/slub.c:3931                                                                         
#29 0xffffffff81643f69 in kmalloc (flags=<optimized out>, size=<optimized out>) at ./include/linux/slab.h:560
#30 tty_buffer_alloc (port=0xffff888819f20000, size=<optimized out>) at drivers/tty/tty_buffer.c:175
#31 0xffffffff81644001 in __tty_buffer_request_room (port=0xffff888819f20000, size=1, flags=0) at drivers/tty/tty_buffer.c:273
#32 0xffffffff8164436e in __tty_insert_flip_char (port=0xffff888819f20000, ch=13 '\r', flag=0 '\000') at drivers/tty/tty_buffer.c:384
#33 0xffffffff81662009 in tty_insert_flip_char (flag=<optimized out>, ch=<optimized out>, port=<optimized out>) at ./include/linux/tty_flip.h:31
#34 uart_insert_char (port=0xffffffff84622720 <serial8250_ports>, status=97, overrun=2, ch=<optimized out>, flag=<optimized out>) at drivers/tty/serial/serial_core.c:3129
#35 0xffffffff81668e65 in serial8250_read_char (up=<optimized out>, lsr=<optimized out>) at drivers/tty/serial/8250/8250_port.c:1767
#36 0xffffffff81668ef6 in serial8250_rx_chars (up=0xffffffff84622720 <serial8250_ports>, lsr=<optimized out>) at drivers/tty/serial/8250/8250_port.c:1782
#37 0xffffffff8166b094 in serial8250_handle_irq (port=0xffffffff84622720 <serial8250_ports>, iir=204) at drivers/tty/serial/8250/8250_port.c:1913
#38 0xffffffff8166b113 in serial8250_handle_irq (iir=<optimized out>, port=<optimized out>) at drivers/tty/serial/8250/8250_port.c:1891
#39 serial8250_default_handle_irq (port=0xffffffff84622720 <serial8250_ports>) at drivers/tty/serial/8250/8250_port.c:1934
#40 0xffffffff81666ac1 in serial8250_interrupt (irq=<optimized out>, dev_id=0xffff888812e8f2c0) at drivers/tty/serial/8250/8250_core.c:126
#41 0xffffffff8111d906 in __handle_irq_event_percpu (desc=0xffff88881c42a000, flags=<optimized out>) at kernel/irq/handle.c:156
#42 0xffffffff8111dbfd in handle_irq_event_percpu (desc=0xffff88881c42a000) at kernel/irq/handle.c:196
#43 0xffffffff8111dc7f in handle_irq_event (desc=0xffff88881c42a000) at kernel/irq/handle.c:213
#44 0xffffffff811223fa in handle_edge_irq (desc=0xffff88881c42a000) at kernel/irq/chip.c:830
#45 0xffffffff82000fb2 in asm_call_on_stack () at arch/x86/entry/entry_64.S:711 
melver commented 4 years ago

Note, we need to test with CONFIG_KFENCE_FAULT_INJECTION=y, as spamming the console quite reliably uncovers some issues -- such as the above lockdep issue.

melver commented 4 years ago

As per https://github.com/google/kasan/issues/108, we've been running with various debug configs, including lockdep. With the latest PR https://github.com/google/kasan/pull/115, it seems lockdep complaints are resolved.

melver commented 4 years ago

Stress testing kfence with the attached patch to make every kmalloc() do a kfence allocation+free, and the attached config (KCSAN+KFENCE).

Then running sysbench as well as running:

while :; do cat /sys/kernel/debug/kfence/objects > /dev/null; done

KCSAN reports no data races in KFENCE.

kfence-concurrency-testing.patch.txt kfence-concurrency-testing.config.txt