openbmc / linux

OpenBMC Linux kernel source tree
Other
49 stars 132 forks source link

KASAN: slab-out-of-bounds in ncsi_rsp_handler_sma #146

Closed shenki closed 6 years ago

shenki commented 6 years ago

e156398 v4.16-rc6-119-ge156398bfcad

from Joel's experimental 4.16 tree, on a qemu romulus machine. Also reproduces on Romulus hardware

[   32.662953] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
[   38.111190] ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
[   38.117326] ftgmac100 1e660000.ethernet eth0: no vlan ids left to set
[   38.131543] ==================================================================
[   38.153464] BUG: KASAN: slab-out-of-bounds in ncsi_rsp_handler_sma+0x15c/0x270
[   38.155622] Write of size 6 at addr 97ff0628 by task kworker/0:1/213
[   38.156769] 
[   38.158546] CPU: 0 PID: 213 Comm: kworker/0:1 Not tainted 4.16.0-rc6-00118-g671c39af8e7d-dirty #269
[   38.159874] Hardware name: Generic DT based system
[   38.161708] Workqueue: events ncsi_dev_work
[   38.164177] [<80016978>] (unwind_backtrace) from [<80012af8>] (show_stack+0x20/0x24)
[   38.164859] [<80012af8>] (show_stack) from [<80929cfc>] (dump_stack+0x20/0x28)
[   38.165588] [<80929cfc>] (dump_stack) from [<8022cfd0>] (print_address_description+0x5c/0x32c)
[   38.166441] [<8022cfd0>] (print_address_description) from [<8022d590>] (kasan_report+0x14c/0x3a4)
[   38.167355] [<8022d590>] (kasan_report) from [<8022b724>] (check_memory_region+0xa0/0x19c)
[   38.168122] [<8022b724>] (check_memory_region) from [<8022bc28>] (memcpy+0x44/0x58)
[   38.168806] [<8022bc28>] (memcpy) from [<8091cc30>] (ncsi_rsp_handler_sma+0x15c/0x270)
[   38.169420] [<8091cc30>] (ncsi_rsp_handler_sma) from [<8091d8f0>] (ncsi_rcv_rsp+0x294/0x48c)
[   38.170098] [<8091d8f0>] (ncsi_rcv_rsp) from [<806ebe64>] (__netif_receive_skb_core+0xc44/0x1368)
[   38.170849] [<806ebe64>] (__netif_receive_skb_core) from [<806ed4ec>] (__netif_receive_skb+0x28/0x148)
[   38.171541] [<806ed4ec>] (__netif_receive_skb) from [<806f582c>] (netif_receive_skb_internal+0x38/0x130)
[   38.172252] [<806f582c>] (netif_receive_skb_internal) from [<806f7020>] (netif_receive_skb+0x34/0x104)
[   38.173039] [<806f7020>] (netif_receive_skb) from [<805b41b0>] (ftgmac100_poll+0x734/0xb88)
[   38.173703] [<805b41b0>] (ftgmac100_poll) from [<806f85a4>] (net_rx_action+0x210/0x7c4)
[   38.174394] [<806f85a4>] (net_rx_action) from [<8000a488>] (__do_softirq+0x178/0x744)
[   38.174984] [<8000a488>] (__do_softirq) from [<80034178>] (do_softirq.part.6+0x5c/0x6c)
[   38.175577] [<80034178>] (do_softirq.part.6) from [<80034298>] (__local_bh_enable_ip+0x110/0x1c0)
[   38.176280] [<80034298>] (__local_bh_enable_ip) from [<806f40d8>] (__dev_queue_xmit+0x364/0xc80)
[   38.177039] [<806f40d8>] (__dev_queue_xmit) from [<806f4a10>] (dev_queue_xmit+0x1c/0x20)
[   38.177763] [<806f4a10>] (dev_queue_xmit) from [<80919f2c>] (ncsi_xmit_cmd+0x380/0x518)
[   38.178431] [<80919f2c>] (ncsi_xmit_cmd) from [<809217e0>] (ncsi_configure_channel+0x530/0xc84)
[   38.179066] [<809217e0>] (ncsi_configure_channel) from [<80922d10>] (ncsi_dev_work+0xe4/0x964)
[   38.179677] [<80922d10>] (ncsi_dev_work) from [<8005a67c>] (process_one_work+0x3a4/0xa54)
[   38.180398] [<8005a67c>] (process_one_work) from [<8005add8>] (worker_thread+0xac/0xb50)
[   38.181069] [<8005add8>] (worker_thread) from [<800668c0>] (kthread+0x24c/0x35c)
[   38.181783] [<800668c0>] (kthread) from [<800090f0>] (ret_from_fork+0x14/0x24)
[   38.182420] Exception stack(0x976abfb0 to 0x976abff8)
[   38.183141] bfa0:                                     00000000 00000000 00000000 00000000
[   38.183933] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   38.184657] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   38.185205] 
[   38.185541] Allocated by task 7:
[   38.186115]  kasan_kmalloc+0xd4/0x174
[   38.186452]  __kmalloc+0xe4/0x250
[   38.186757]  ncsi_rsp_handler_gc+0x1fc/0x37c
[   38.187105]  ncsi_rcv_rsp+0x294/0x48c
[   38.187426]  __netif_receive_skb_core+0xc44/0x1368
[   38.187809]  __netif_receive_skb+0x28/0x148
[   38.188146]  netif_receive_skb_internal+0x38/0x130
[   38.188518]  netif_receive_skb+0x34/0x104
[   38.188847]  ftgmac100_poll+0x734/0xb88
[   38.189161]  net_rx_action+0x210/0x7c4
[   38.189475]  __do_softirq+0x178/0x744
[   38.189781] 
[   38.189945] Freed by task 1:
[   38.190214]  __kasan_slab_free+0x110/0x1ec
[   38.190546]  kasan_slab_free+0x14/0x18
[   38.190854]  kfree+0x7c/0x180
[   38.191120]  do_copy+0x70/0x160
[   38.191396]  write_buffer+0x84/0xa4
[   38.191688]  flush_buffer+0x40/0xcc
[   38.191992]  unxz+0x1ec/0x34c
[   38.192260]  unpack_to_rootfs+0x258/0x4ec
[   38.192590]  populate_rootfs+0x68/0x118
[   38.192943]  do_one_initcall+0x15c/0x260
[   38.193262]  kernel_init_freeable+0x2a4/0x388
[   38.193610]  kernel_init+0x1c/0x124
[   38.193901]  ret_from_fork+0x14/0x24
[   38.194378]    (null)
[   38.194581] 
[   38.194769] The buggy address belongs to the object at 97ff0600
[   38.194769]  which belongs to the cache kmalloc-32 of size 32
[   38.195612] The buggy address is located 8 bytes to the right of
[   38.195612]  32-byte region [97ff0600, 97ff0620)
[   38.196294] The buggy address belongs to the page:
[   38.196788] page:9fefae00 count:1 mapcount:0 mapping:97ff0000 index:0x97ff0fc1
[   38.197474] flags: 0x100(slab)
[   38.198200] raw: 00000100 97ff0000 97ff0fc1 00000032 00000001 9fef86d4 9fee7bb4 97c00620
[   38.198846] page dumped because: kasan: bad access detected
[   38.199271] 
[   38.199456] Memory state around the buggy address:
[   38.200084]  97ff0500: 00 04 fc fc fc fc fc fc 00 00 00 04 fc fc fc fc
[   38.200597]  97ff0580: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc
[   38.201094] >97ff0600: 00 00 00 04 fc fc fc fc 03 fc fc fc fc fc fc fc
[   38.201631]                           ^
[   38.201957]  97ff0680: 00 00 00 04 fc fc fc fc 00 00 04 fc fc fc fc fc
[   38.202431]  97ff0700: 05 fc fc fc fc fc fc fc 03 fc fc fc fc fc fc fc
[   38.202924] ==================================================================
[   38.203465] Disabling lock debugging due to kernel taint
[   39.243568] ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 config done
[   39.243917] ftgmac100 1e660000.ethernet eth0: NCSI: No more channels to process
[   39.244139] ftgmac100 1e660000.ethernet eth0: NCSI interface up
#0  kasan_report (addr=5, size=1912594629, is_write=198, ip=1) at mm/kasan/report.c:398
No locals.
#1  0x8022b724 in check_memory_region_inline (ret_ip=<optimized out>, write=<optimized out>, size=<optimized out>, addr=<optimized out>)
    at mm/kasan/kasan.c:260
No locals.
#2  check_memory_region (addr=2550072872, size=6, write=true, ret_ip=2157038640) at mm/kasan/kasan.c:274
No locals.
#3  0x8022bc28 in memcpy (dest=0x97ff0628, src=0x9769da80, len=6) at mm/kasan/kasan.c:310
No locals.
#4  0x8091cc30 in ncsi_rsp_handler_sma (nr=0x9769da86) at net/ncsi/ncsi-rsp.c:459
        ndp = 0x97ff0610
        nc = 0x97441b00
        ncf = 0x97ff0610
        bitmap = 0x6
#5  0x8091d8f0 in ncsi_rcv_rsp (skb=0x932ac700, dev=0x6, pt=0x1, orig_dev=0x8091cc30 <ncsi_rsp_handler_sma+348>) at net/ncsi/ncsi-rsp.c:1040
        nd = 0x97738020
        nr = 0x97738310
        payload = 6
        i = -1754037536
        ret = 0
#6  0x806ebe64 in __netif_receive_skb_core (skb=0x932ac700, pfmemalloc=6) at net/core/dev.c:4554
        pt_prev = 0x9773ac78
        orig_dev = 0x97763180
        ret = 1
#7  0x806ed4ec in __netif_receive_skb (skb=0x932ac700) at net/core/dev.c:4619
        ret = -1825913088
#8  0x806f582c in netif_receive_skb_internal (skb=0x932ac700) at net/core/dev.c:4693
        ret = -1744894424
#9  0x806f7020 in netif_receive_skb (skb=0x932ac700) at net/core/dev.c:4717
No locals.
#10 0x805b41b0 in ftgmac100_rx_packet (processed=<optimized out>, priv=<optimized out>) at drivers/net/ethernet/faraday/ftgmac100.c:575
        pointer = 63624
        rxdes = 0xa09ba0e0
        map = 2446730530
#11 ftgmac100_poll (napi=0x97763658, budget=6) at drivers/net/ethernet/faraday/ftgmac100.c:1328
        work_done = 0
#12 0x806f85a4 in napi_poll (repoll=<optimized out>, n=<optimized out>) at net/core/dev.c:5697
        work = 1
        weight = 64
        __warned = false
        __print_once = false
#13 net_rx_action (h=0x97ff0628) at net/core/dev.c:5763
        list = {next = 0x976abba0, prev = 0x976abba0}
        repoll = {next = 0x976abba8, prev = 0x976abba8}
#14 0x8000a488 in __do_softirq () at kernel/softirq.c:285
        vec_nr = 3
        pending = 8
#15 0x80034178 in do_softirq_own_stack () at ./include/linux/interrupt.h:499
No locals.
#16 do_softirq () at kernel/softirq.c:329
No locals.
#17 0x80034298 in do_softirq () at kernel/softirq.c:321
No locals.
#18 __local_bh_enable_ip (ip=2550072872, cnt=2096896) at kernel/softirq.c:182
No locals.
#19 0x806f40d8 in local_bh_enable () at ./include/linux/bottom_half.h:32
No locals.
#20 rcu_read_unlock_bh () at ./include/linux/rcupdate.h:726
No locals.
#21 __dev_queue_xmit (skb=0x0, accel_priv=0x6) at net/core/dev.c:3576
        dev = 0x97763180
        txq = 0x93611240
        rc = 0
#22 0x806f4a10 in dev_queue_xmit (skb=0x97ff0628) at net/core/dev.c:3582
No locals.
#23 0x80919f2c in ncsi_xmit_cmd (nca=0x9769da62) at net/ncsi/ncsi-cmd.c:348
        eh = 0x9769da62
        i = -1754037476
#24 0x809217e0 in ncsi_configure_channel (ndp=0x97738020) at net/ncsi/ncsi-manage.c:904
        np = 0x68954245
        nc = 0x976abdd2
        hot_nc = 0x97763180
        nca = <incomplete type>
#25 0x80922d10 in ncsi_dev_work (work=0x9773ac68) at net/ncsi/ncsi-manage.c:1288
No locals.
#26 0x8005a67c in process_one_work (worker=0x9764a400, work=0x9773ac68) at kernel/workqueue.c:2113
        pool = 0x80bab0fc <cpu_worker_pools>
#27 0x8005add8 in worker_thread (__worker=0x9764a400) at kernel/workqueue.c:2247
        pool = 0x80bab0fc <cpu_worker_pools>
#28 0x800668c0 in kthread (_create=0x9764b560) at kernel/kthread.c:238
        threadfn = 0x8005ad2c <worker_thread>
        data = 0x9764a400
        ret = -1744975904
sammj commented 6 years ago

Looks like there's two triggers here - handling an SMA response and handling a SVF response. I've only triggered the SVF case so far but I suspect they're both the same root cause: NCSI has a generic concept of filters which it uses for both MAC addresses and VLAN IDs, storing either kind of data in a u32 data[] buffer. NCSI stores differently sized types in this buffer and allocates it accordingly - looks like it just gets the calculation slightly wrong and writes into unallocated memory just off the end of the buffer. I've used this as a chance to finish off a refactor of the filtering code I already had going and so far it looks to avoid this error - patches to come.

sammj commented 6 years ago

Fixed by https://github.com/torvalds/linux/commit/062b3e1b6d4f2a33c1d0fd7ae9b4550da5cf7e4b#diff-f391518f4e552724349be3589e00dfa7

shenki commented 6 years ago

Thanks @sammj !