4.0.y: strangeness with errors when using krealloc and later mapping the reallocated memory

msperl commented 9 years ago

@notro found some issue when testing the new feature of the spi-bcm2835 driver that strangely ONLY occurs in the foundation kernel - it does NOT show in 4.1.0-rc2.

Here an example output of what we see on the console:

May  7 20:18:17 raspb kernel: [  118.908906] fbtft: module is from the staging directory, the quality is unknown, you have been warned.
May  7 20:18:17 raspb kernel: [  118.916843] fb_st7735r: module is from the staging directory, the quality is unknown, you have been warned.
May  7 20:18:17 raspb kernel: [  118.921871] fbtft_of_value: buswidth = 8
May  7 20:18:17 raspb kernel: [  118.921917] fbtft_of_value: txbuflen = 32768
May  7 20:18:18 raspb kernel: [  119.866243] graphics fb1: fb_st7735r frame buffer, 128x160, 40 KiB video memory, 32 KiB DMA buffer memory, fps=20, spi0.4 at 8 MHz
May  7 20:18:18 raspb kernel: [  119.941569] Modules linked in: fb_st7735r(C) fbtft(C) syscopyarea sysfillrect sysimgblt fb_sys_fops spi_bcm2835 binfmt_misc i2c_bcm2708 uio_pdrv_genirq uio
May  7 20:18:18 raspb kernel: [  119.941662] CPU: 0 PID: 19 Comm: kworker/0:1 Tainted: G         C      4.0.1+ #31
May  7 20:18:18 raspb kernel: [  119.941680] Hardware name: BCM2708
May  7 20:18:18 raspb kernel: [  119.941713] Workqueue: events fb_deferred_io_work
May  7 20:18:18 raspb kernel: [  119.941784] [<c0015b40>] (unwind_backtrace) from [<c0012914>] (show_stack+0x20/0x24)
May  7 20:18:18 raspb kernel: [  119.941833] [<c0012914>] (show_stack) from [<c05656b8>] (dump_stack+0x20/0x28)
May  7 20:18:18 raspb kernel: [  119.941880] [<c05656b8>] (dump_stack) from [<c00e5a9c>] (bad_page+0xc8/0x128)
May  7 20:18:18 raspb kernel: [  119.941918] [<c00e5a9c>] (bad_page) from [<c00e8c48>] (get_page_from_freelist+0x664/0x960)
May  7 20:18:18 raspb kernel: [  119.941953] [<c00e8c48>] (get_page_from_freelist) from [<c00e9044>] (__alloc_pages_nodemask+0x100/0x8f4)
May  7 20:18:18 raspb kernel: [  119.941987] [<c00e9044>] (__alloc_pages_nodemask) from [<c00e9920>] (alloc_kmem_pages+0x20/0x28)
May  7 20:18:18 raspb kernel: [  119.942023] [<c00e9920>] (alloc_kmem_pages) from [<c010395c>] (kmalloc_order+0x20/0x5c)
May  7 20:18:18 raspb kernel: [  119.942054] [<c010395c>] (kmalloc_order) from [<c01039c4>] (kmalloc_order_trace+0x2c/0xd4)
May  7 20:18:18 raspb kernel: [  119.942090] [<c01039c4>] (kmalloc_order_trace) from [<c0128a24>] (__kmalloc_track_caller+0x218/0x224)
May  7 20:18:18 raspb kernel: [  119.942123] [<c0128a24>] (__kmalloc_track_caller) from [<c0103860>] (krealloc+0x60/0xb8)
May  7 20:18:18 raspb kernel: [  119.942156] [<c0103860>] (krealloc) from [<c03c5064>] (__spi_pump_messages+0x6f8/0x76c)
May  7 20:18:18 raspb kernel: [  119.942188] [<c03c5064>] (__spi_pump_messages) from [<c03c52f4>] (__spi_sync+0x21c/0x22c)
May  7 20:18:18 raspb kernel: [  119.942214] [<c03c52f4>] (__spi_sync) from [<c03c5340>] (spi_sync+0x1c/0x20)
May  7 20:18:18 raspb kernel: [  119.942300] [<c03c5340>] (spi_sync) from [<bf032148>] (fbtft_write_spi+0x90/0x108 [fbtft])
May  7 20:18:18 raspb kernel: [  119.942382] [<bf032148>] (fbtft_write_spi [fbtft]) from [<bf0315b0>] (fbtft_write_vmem16_bus8+0xec/0x14c [fbtft])
May  7 20:18:18 raspb kernel: [  119.942444] [<bf0315b0>] (fbtft_write_vmem16_bus8 [fbtft]) from [<bf030128>] (fbtft_update_display+0xd8/0x33c [fbtft])
May  7 20:18:18 raspb kernel: [  119.942502] [<bf030128>] (fbtft_update_display [fbtft]) from [<bf0304b0>] (fbtft_deferred_io+0x124/0x150 [fbtft])
May  7 20:18:18 raspb kernel: [  119.942547] [<bf0304b0>] (fbtft_deferred_io [fbtft]) from [<c033954c>] (fb_deferred_io_work+0x9c/0xf4)
May  7 20:18:18 raspb kernel: [  119.942584] [<c033954c>] (fb_deferred_io_work) from [<c003ad04>] (process_one_work+0x13c/0x490)
May  7 20:18:18 raspb kernel: [  119.942616] [<c003ad04>] (process_one_work) from [<c003b24c>] (worker_thread+0x1b0/0x528)
May  7 20:18:18 raspb kernel: [  119.942656] [<c003b24c>] (worker_thread) from [<c00404b0>] (kthread+0xdc/0xf8)
May  7 20:18:18 raspb kernel: [  119.942694] [<c00404b0>] (kthread) from [<c000ec90>] (ret_from_fork+0x14/0x24)
May  7 20:18:18 raspb kernel: [  119.942711] Disabling lock debugging due to kernel taint

We have now traced this down to spi_map_msg, which:

for messages that only have TX or RX transfers will create a RX/TX-buffer with krealloc
then map the buffer by first using sg_set_page/sg_set_buf and subsequently calling dma_map_sg

if both code sections are run together, then things break and also impact OTHER parts of the kernel like this one:

May  7 20:18:41 raspb kernel: [  142.344159] Hardware name: BCM2708
May  7 20:18:41 raspb kernel: [  142.344234] [<c0015b40>] (unwind_backtrace) from [<c0012914>] (show_stack+0x20/0x24)
May  7 20:18:41 raspb kernel: [  142.344282] [<c0012914>] (show_stack) from [<c05656b8>] (dump_stack+0x20/0x28)
May  7 20:18:41 raspb kernel: [  142.344329] [<c05656b8>] (dump_stack) from [<c00e5a9c>] (bad_page+0xc8/0x128)
May  7 20:18:41 raspb kernel: [  142.344367] [<c00e5a9c>] (bad_page) from [<c00e8c48>] (get_page_from_freelist+0x664/0x960)
May  7 20:18:41 raspb kernel: [  142.344401] [<c00e8c48>] (get_page_from_freelist) from [<c00e9044>] (__alloc_pages_nodemask+0x100/0x8f4)
May  7 20:18:41 raspb kernel: [  142.344450] [<c00e9044>] (__alloc_pages_nodemask) from [<c0475be4>] (__alloc_page_frag+0x8c/0x148)
May  7 20:18:41 raspb kernel: [  142.344487] [<c0475be4>] (__alloc_page_frag) from [<c0476490>] (__netdev_alloc_frag+0x38/0x88)
May  7 20:18:41 raspb kernel: [  142.344521] [<c0476490>] (__netdev_alloc_frag) from [<c047ae74>] (__alloc_rx_skb+0xd4/0xec)
May  7 20:18:41 raspb kernel: [  142.344553] [<c047ae74>] (__alloc_rx_skb) from [<c047af08>] (__netdev_alloc_skb+0x28/0x50)
May  7 20:18:41 raspb kernel: [  142.344595] [<c047af08>] (__netdev_alloc_skb) from [<c03ccbb8>] (rx_submit+0x3c/0x284)
May  7 20:18:41 raspb kernel: [  142.344631] [<c03ccbb8>] (rx_submit) from [<c03cd058>] (rx_complete+0x1e8/0x1f8)
May  7 20:18:41 raspb kernel: [  142.344663] [<c03cd058>] (rx_complete) from [<c03d545c>] (__usb_hcd_giveback_urb+0x80/0x154)
May  7 20:18:41 raspb kernel: [  142.344693] [<c03d545c>] (__usb_hcd_giveback_urb) from [<c03d56d4>] (usb_hcd_giveback_urb+0x48/0x120)
May  7 20:18:41 raspb kernel: [  142.344733] [<c03d56d4>] (usb_hcd_giveback_urb) from [<c03fdec4>] (completion_tasklet_func+0x6c/0x98)
May  7 20:18:41 raspb kernel: [  142.344769] [<c03fdec4>] (completion_tasklet_func) from [<c040be80>] (tasklet_callback+0x20/0x24)
May  7 20:18:41 raspb kernel: [  142.344816] [<c040be80>] (tasklet_callback) from [<c0027ccc>] (tasklet_hi_action+0x84/0xd8)

This does NOT happen with an upstream 4.1.0-rc2 kernel, so it must be something specific to downstream/custom code

It also only happens for the second mapping request that is 16384 bytes (actually might be anything > 8192 bytes - it was only tested with 4K/8K/16K/32K transfers)

Here with some dev_info messages arround the krealloc:

[   82.843285] fbtft: module is from the staging directory, the quality is unknown, you have been warned.
[   82.852350] fb_st7735r: module is from the staging directory, the quality is unknown, you have been warned.
[   82.855984] fbtft_of_value: buswidth = 8
[   82.856030] fbtft_of_value: txbuflen = 16384
[   82.977874] spi_master spi0: krealloc-rx - 00000000 - 1
[   82.977923] spi_master spi0: krealloc-rx - da6c8be0
[   83.128369] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.128414] spi_master spi0: krealloc-rx - da6c8be0
[   83.629290] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.629332] spi_master spi0: krealloc-rx - da6c8be0
[   83.629443] spi_master spi0: krealloc-rx - 00000000 - 3
[   83.629472] spi_master spi0: krealloc-rx - da6c8be0
[   83.629565] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.629590] spi_master spi0: krealloc-rx - da6c8be0
[   83.629669] spi_master spi0: krealloc-rx - 00000000 - 3
[   83.629697] spi_master spi0: krealloc-rx - da6c8be0
[   83.629788] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.629816] spi_master spi0: krealloc-rx - da6c8be0
[   83.629894] spi_master spi0: krealloc-rx - 00000000 - 6
[   83.629920] spi_master spi0: krealloc-rx - da6c8be0
[   83.630007] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.630032] spi_master spi0: krealloc-rx - da6c8be0
[   83.630110] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.630139] spi_master spi0: krealloc-rx - da6c8be0
[   83.630219] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.630247] spi_master spi0: krealloc-rx - da6c8be0
[   83.630322] spi_master spi0: krealloc-rx - 00000000 - 3
[   83.630348] spi_master spi0: krealloc-rx - da6c8be0
[   83.630431] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.630458] spi_master spi0: krealloc-rx - da6c8be0
[   83.630534] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.630561] spi_master spi0: krealloc-rx - da6c8be0
[   83.630645] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.630671] spi_master spi0: krealloc-rx - da6c8be0
[   83.630747] spi_master spi0: krealloc-rx - 00000000 - 2
[   83.630774] spi_master spi0: krealloc-rx - da6c8be0
[   83.630856] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.630885] spi_master spi0: krealloc-rx - da6c8be0
[   83.630960] spi_master spi0: krealloc-rx - 00000000 - 2
[   83.630988] spi_master spi0: krealloc-rx - da6c8be0
[   83.631070] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.631096] spi_master spi0: krealloc-rx - da6c8be0
[   83.631172] spi_master spi0: krealloc-rx - 00000000 - 2
[   83.631198] spi_master spi0: krealloc-rx - da6c8be0
[   83.631280] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.631307] spi_master spi0: krealloc-rx - da6c8be0
[   83.631423] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.631453] spi_master spi0: krealloc-rx - da6c8be0
[   83.631577] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.631607] spi_master spi0: krealloc-rx - da6c8be0
[   83.631696] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.631723] spi_master spi0: krealloc-rx - da6c8be0
[   83.631802] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.631829] spi_master spi0: krealloc-rx - da6c8be0
[   83.631906] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.631934] spi_master spi0: krealloc-rx - da6c8be0
[   83.736459] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.736508] spi_master spi0: krealloc-rx - da6c8be0
[   83.747254] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.747299] spi_master spi0: krealloc-rx - da6c8be0
[   83.747410] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.747440] spi_master spi0: krealloc-rx - da6c8be0
[   83.747524] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.747552] spi_master spi0: krealloc-rx - da6c8be0
[   83.747630] spi_master spi0: krealloc-rx - 00000000 - 4
[   83.747657] spi_master spi0: krealloc-rx - da6c8be0
[   83.747736] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.747765] spi_master spi0: krealloc-rx - da6c8be0
[   83.747839] spi_master spi0: krealloc-rx - 00000000 - 4
[   83.747865] spi_master spi0: krealloc-rx - da6c8be0
[   83.747943] spi_master spi0: krealloc-rx - 00000000 - 1
[   83.747971] spi_master spi0: krealloc-rx - da6c8be0
[   83.748271] spi_master spi0: krealloc-rx - 00000000 - 16384
[   83.748314] spi_master spi0: krealloc-rx - da6ac000
[   83.765678] spi_master spi0: krealloc-rx - 00000000 - 16384
[   83.765731] BUG: Bad page state in process modprobe  pfn:1a6ac
[   83.774518] page:db7ba030 count:0 mapcount:0 mapping:  (null) index:0x0
[   83.783143] flags: 0x200(arch_1)
[   83.791861] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[   83.803147] bad because of flags:
[   83.808397] flags: 0x200(arch_1)
[   83.813208] Modules linked in: fb_st7735r(C+) fbtft(C) syscopyarea sysfillrect sysimgblt fb_sys_fops spi_bcm2835 binfmt_misc i2c_bcm2708 uio_pdrv_genirq uio
[   83.813297] CPU: 0 PID: 2133 Comm: modprobe Tainted: G         C      4.0.1+ #35
[   83.813313] Hardware name: BCM2708
[   83.813389] [<c001672c>] (unwind_backtrace) from [<c0013314>] (show_stack+0x20/0x24)
[   83.813439] [<c0013314>] (show_stack) from [<c0576504>] (dump_stack+0x20/0x28)
[   83.813481] [<c0576504>] (dump_stack) from [<c00e7314>] (bad_page+0xc8/0x128)
[   83.813519] [<c00e7314>] (bad_page) from [<c00ea4c0>] (get_page_from_freelist+0x664/0x960)
[   83.813553] [<c00ea4c0>] (get_page_from_freelist) from [<c00ea8bc>] (__alloc_pages_nodemask+0x100/0x8f4)
[   83.813582] [<c00ea8bc>] (__alloc_pages_nodemask) from [<c00eb198>] (alloc_kmem_pages+0x20/0x28)
[   83.813612] [<c00eb198>] (alloc_kmem_pages) from [<c01051d4>] (kmalloc_order+0x20/0x5c)
[   83.813641] [<c01051d4>] (kmalloc_order) from [<c010523c>] (kmalloc_order_trace+0x2c/0xd4)
[   83.813675] [<c010523c>] (kmalloc_order_trace) from [<c012a2d4>] (__kmalloc_track_caller+0x218/0x224)
[   83.813703] [<c012a2d4>] (__kmalloc_track_caller) from [<c01050d8>] (krealloc+0x60/0xb8)
[   83.813734] [<c01050d8>] (krealloc) from [<c03cce08>] (__spi_pump_messages+0x70c/0x79c)
[   83.813764] [<c03cce08>] (__spi_pump_messages) from [<c03cd0b4>] (__spi_sync+0x21c/0x22c)
[   83.813789] [<c03cd0b4>] (__spi_sync) from [<c03cd100>] (spi_sync+0x1c/0x20)
[   83.813871] [<c03cd100>] (spi_sync) from [<bf0341c4>] (fbtft_write_spi+0x90/0x108 [fbtft])
[   83.813953] [<bf0341c4>] (fbtft_write_spi [fbtft]) from [<bf0335fc>] (fbtft_write_vmem16_bus8+0xfc/0x18c [fbtft])
[   83.814018] [<bf0335fc>] (fbtft_write_vmem16_bus8 [fbtft]) from [<bf032128>] (fbtft_update_display+0xd8/0x33c [fbtft])
[   83.814076] [<bf032128>] (fbtft_update_display [fbtft]) from [<bf0310e0>] (fbtft_register_framebuffer+0x114/0x2c0 [fbtft])
[   83.814131] [<bf0310e0>] (fbtft_register_framebuffer [fbtft]) from [<bf031a44>] (fbtft_probe_common+0x124/0x47c [fbtft])
[   83.814190] [<bf031a44>] (fbtft_probe_common [fbtft]) from [<bf03f098>] (fbtft_driver_probe_spi+0x24/0x2c [fb_st7735r])
[   83.814230] [<bf03f098>] (fbtft_driver_probe_spi [fb_st7735r]) from [<c03cc2f4>] (spi_drv_probe+0x48/0x4c)
[   83.814269] [<c03cc2f4>] (spi_drv_probe) from [<c037ce50>] (driver_probe_device+0x130/0x378)
[   83.814300] [<c037ce50>] (driver_probe_device) from [<c037d134>] (__driver_attach+0x9c/0xa0)
[   83.814326] [<c037d134>] (__driver_attach) from [<c037b0c8>] (bus_for_each_dev+0x64/0x98)
[   83.814353] [<c037b0c8>] (bus_for_each_dev) from [<c037c828>] (driver_attach+0x28/0x30)
[   83.814380] [<c037c828>] (driver_attach) from [<c037c420>] (bus_add_driver+0xf8/0x218)
[   83.814410] [<c037c420>] (bus_add_driver) from [<c037d664>] (driver_register+0x88/0x104)
[   83.814438] [<c037d664>] (driver_register) from [<c03cc298>] (spi_register_driver+0x54/0x68)
[   83.814472] [<c03cc298>] (spi_register_driver) from [<bf041014>] (fbtft_driver_module_init+0x14/0x38 [fb_st7735r])
[   83.814512] [<bf041014>] (fbtft_driver_module_init [fb_st7735r]) from [<c00087b4>] (do_one_initcall+0xbc/0x204)
[   83.814555] [<c00087b4>] (do_one_initcall) from [<c008ad28>] (do_init_module+0x6c/0x1c4)
[   83.814586] [<c008ad28>] (do_init_module) from [<c008c7ac>] (load_module+0x17f4/0x1ff0)
[   83.814616] [<c008c7ac>] (load_module) from [<c008d080>] (SyS_init_module+0xd8/0x150)
[   83.814651] [<c008d080>] (SyS_init_module) from [<c000f5c0>] (ret_fast_syscall+0x0/0x4c)
[   83.814668] Disabling lock debugging due to kernel taint
[   83.814708] spi_master spi0: krealloc-rx - da4c0000
[   83.832113] spi_master spi0: krealloc-rx - da4c0000 - 8192
[   83.832156] spi_master spi0: krealloc-rx - da4c0000
[   83.840915] spi_master spi0: krealloc-rx - da4c0000 - 1
[   83.840956] spi_master spi0: krealloc-rx - da4c0000
[   83.841006] spi_master spi0: krealloc-rx - da4c0000 - 16
[   83.841029] spi_master spi0: krealloc-rx - da4c0000
[   83.841086] spi_master spi0: krealloc-rx - da4c0000 - 1
[   83.841108] spi_master spi0: krealloc-rx - da4c0000
[   83.841148] spi_master spi0: krealloc-rx - da4c0000 - 16
[   83.841170] spi_master spi0: krealloc-rx - da4c0000
[   83.841837] graphics fb1: fb_st7735r frame buffer, 128x160, 40 KiB video memory, 16 KiB DMA buffer memory, fps=20, spi0.4 at 8 MHz
[   85.506454] spi_master spi0: krealloc-rx - 00000000 - 1
[   85.506502] spi_master spi0: krealloc-rx - dafac180
[   85.506574] spi_master spi0: krealloc-rx - dafac180 - 4

Finally see also discussions in this thread: https://github.com/msperl/spi-bcm2835/issues/13#issuecomment-99460803

msperl commented 9 years ago

Looking at a buffer of 8448 bytes (8k+256) it show the same behavior, so there is something that applies to more than 2 pages that triggers this (as if some mapping is not removed)

popcornmix commented 9 years ago

No immediate thoughts. I can't think of any downstream patches that could affect this - we don't really change anything in the core of the kernel. Obviously if this is a memory corruption issue, then the fact it's not visible in upstream kernel may just mean it's corrupting something less critical there, rather than proving it's a bug in the Pi kernel tree (although I'm not ruling that out).

pelwell commented 9 years ago

Are you using CMA, as that can involve some interesting remapping?

popcornmix commented 9 years ago

The other thought is if you are dma-ing to a buffer and use the wrong caching (either the top-two bits of address means it is incorrectly cached in GPU's L1/L2 cache), or the wrong arm side caching, and you then realloc (so buffer moves physically), you may later get corruption when the cached data gets written back (e.g. if dma output was in GPU's L2 cache).

msperl commented 9 years ago

as said: it seems to fail on krealloc - what that may take as memory pools is the question (and that may be configurable per "platform").

As I am unable to repeat this upstream (4.1-rc2) I have to assume it is downstream... It could be some concurrency issue...

msperl commented 9 years ago

@popcornmix: Well - actually I have disabled the actual DMA transfer - I just let the framework do the mapping and then fall back to interrupts.

So corruption due to DMA is out of the picture - everything is just calls to kernel.

The driver changes to get that far are actually minimal:

get the DMA-channels and setup dmaengine
setup can_dma function
setup flags to create buffers in case only transmit occurs.

So there could be an issue in DMA-engine that triggers this. And that is obviously different to the one upstream.

notro commented 9 years ago

Are you using CMA, as that can involve some interesting remapping?

If you mean drivers/char/broadcom/vc_cma/vc_cma.c, I just tried disabling it without any effect. I'm using plain bcmrpi_config with the SPI dma patch.

I will try enabling BCM2708_NOL2CACHE.

popcornmix commented 9 years ago

Are you using non-arm-cached allocations? Are you using bcm2709_defconfig, or your own config? Do you have CONIG_DMA_CMA enabled? See: https://github.com/raspberrypi/linux/issues/575#issuecomment-41812586

pelwell commented 9 years ago

Yes, that's what I meant. I'm happy that we can rule it out.

popcornmix commented 9 years ago

@notro BCM2708_NOL2CACHE needs to be in sync with config.txt option disable_l2cache (if you are using non-default options).

notro commented 9 years ago

BCM2708_NOL2CACHE needs to be in sync with config.txt option disable_l2cache

Thanks.

But it didn't help.

Verification that the offset has changed:

[    1.369449] BCM2708FB: allocated DMA memory 5bc00000

BCM2708_NOL2CACHE
[    1.460828] BCM2708FB: allocated DMA memory dbc00000

msperl commented 9 years ago

Thinking of it: there are 2 more calls that we can try to identify which is the culprit! Dma_map_sg and sg_set_page/sg_set_buf I will try if I can discern which one it is later tonight.

notro commented 9 years ago

Do you have CONIG_DMA_CMA enabled?

According to kconfig help on this option, cma=0 on the kernel command line should disable this. I tried it, but it didn't help.

regular
[    0.000000] cma: Reserved 8 MiB at 0x1b800000
[    0.000000] Memory: 436696K/458752K available (5994K kernel code, 328K rwdata, 1912K rodata, 348K init, 711K bss, 13864K reserved, 8192K cma-reserved)

cma=0
[    0.000000] Memory: 444904K/458752K available (5994K kernel code, 328K rwdata, 1912K rodata, 348K init, 711K bss, 13848K reserved, 0K cma-reserved)

popcornmix commented 9 years ago

I believe you want CONFIG_DMA_CMA enabled to avoid cache coherency issues, not disabled.

notro commented 9 years ago

Ok.

notro commented 9 years ago

I found this: Re: dma_unmap causing issues with __get_free_pages

Which led me to try and remove the freeing of the dummy buffer in __spi_pump_messages():

//      kfree(master->dummy_rx);
//      master->dummy_rx = NULL;

And voila, the error is gone! So it can have something to do with kfree/dma_unmap in the wrong order.

notro commented 9 years ago

Yes, dma_unmap is happening after kfree (the actual kfree is disabled in my kernel, this is only a message):

[   58.319845] krealloc(dummy_rx=da4039e0, max_rx=16384)
[   58.319908] krealloc returned: d9058000, page: db787c60, pfn:19058
[   58.319956] spi_map_buf: dma_map_sg(dma=0xdb820000)
[   58.319990] spi_map_buf: dma_map_sg(dma=0xd9058000)
[   58.324416] __spi_pump_messages: kfree(master->dummy_rx=d9058000)
[   58.324487] spi_unmap_buf: dma_unmap_sg(dma=0xd9058000)
[   58.324531] spi_unmap_buf: dma_unmap_sg(dma=0xdb820000)

msperl commented 9 years ago

But why does it impact the foundation kernel, but not the upstream 4.1?

Also this "workaround" is mostly an optimization that you are doing: avoiding some reallocations... Imo the whole setup inside spi.c does not make a lot of sense, as it will even map rx- buffers where it is not needed (for short-non-dma cases, so wasting cycles unnecessarily)

But still the order is ok -the free happens at the beginning of the pump task(if it is null it is also ok) Allocate and map via spi_map_mesg. And release via finalize_current_msg and in there via spi_unmap_msg

The order of mapping is imo also ok: In spi_map_buf: sgset followed by dma_map_sg In spi_unmap_buf: dma_unmap_sg followed by sg_free_table

My guess is that there is something in sg_ that does not do what is needed...

Note that: Dma_map_sg_attrs calls dma_map_ops->map_sg Dma_unmap_sg_attrs calls dma_map_ops->unmap_sg So it could be one of those 2 functions that does not do what is fully expected...

msperl commented 9 years ago

Ok- it happens but I do not understand why it would get freed before the finalize_current_msg is called. I guess I need to instrument the 4.1 kernel like you did to see if it is changed...

notro commented 9 years ago

It seems that spi_finalize_current_message() is kicking the worker thread which frees the buffer, then it continues with spi_unmap_msg():

[   52.012752] __spi_sync
[   52.012783] __spi_pump_messages(in_kthread=0) IN
[   52.012803] __spi_pump_messages(in_kthread=0) prepare
[   52.012818] __spi_pump_messages(in_kthread=0) spi_map_msg
[   52.012835] krealloc(dummy_rx=da53f9e0, max_rx=16384)
[   52.012873] krealloc returned: da684000, page: db7b9a90, pfn:1a684
[   52.012918] spi_map_buf: dma_map_sg(dma=0xdb820000)
[   52.012952] spi_map_buf: dma_map_sg(dma=0xda684000)
[   52.012972] __spi_pump_messages(in_kthread=0) transfer_one_message
[   52.020610] spi_finalize_current_message IN
[   52.020697] __spi_pump_messages(in_kthread=1) IN
[   52.020734] __spi_pump_messages(in_kthread=1): kfree(master->dummy_rx=da684000) Begin
[   52.020755] __spi_pump_messages(in_kthread=1) idle OUT
[   52.020803] spi_unmap_buf: dma_unmap_sg(dma=0xda684000)
[   52.020844] spi_unmap_buf: dma_unmap_sg(dma=0xdb820000)
[   52.020870] spi_finalize_current_message OUT
[   52.020890] __spi_pump_messages(in_kthread=0) OUT

notro commented 9 years ago

It seems to be a race of some kind. When I add pr_info's to spi_transfer_one_message(), the freeing waits until the end:

[   47.111580] __spi_sync
[   47.111615] __spi_pump_messages(in_kthread=0) IN
[   47.111634] __spi_pump_messages(in_kthread=0) prepare
[   47.111649] __spi_pump_messages(in_kthread=0) spi_map_msg
[   47.111665] krealloc(dummy_rx=d9070000, max_rx=16384)
[   47.111685] krealloc returned: d9070000, page: db787fc0, pfn:19070
[   47.111721] spi_map_buf: dma_map_sg(dma=0xdb820000)
[   47.111754] spi_map_buf: dma_map_sg(dma=0xd9070000)
[   47.111772] __spi_pump_messages(in_kthread=0) transfer_one_message
[   47.111785] spi_transfer_one_message IN
[   47.111927] __spi_pump_messages(in_kthread=1) IN
[   47.116160] spi_finalize_current_message IN
[   47.116224] spi_unmap_buf: dma_unmap_sg(dma=0xd9070000)
[   47.116345] spi_unmap_buf: dma_unmap_sg(dma=0xdb820000)
[   47.116380] spi_finalize_current_message OUT
[   47.116397] spi_transfer_one_message OUT ret=0
[   47.116413] __spi_pump_messages(in_kthread=0) OUT

[   47.154759] __spi_pump_messages(in_kthread=1) IN
[   47.154800] __spi_pump_messages(in_kthread=1): kfree(master->dummy_rx=d9070000) Begin
[   47.154823] __spi_pump_messages(in_kthread=1) idle OUT

notro commented 9 years ago

I slow some things down with all these log messages so the next message is begun before the worker thread kicks in:

[   47.286403] spi_transfer_one_message IN
[   47.286588] __spi_pump_messages(in_kthread=1) IN
[   47.286629] __spi_pump_messages(in_kthread=1) OUT already running

This stops the thread from freeing the buffer.

Full log: https://gist.github.com/notro/78dd25cae4adca8f30c9

static void __spi_pump_messages(struct spi_master *master, bool in_kthread)
{
    unsigned long flags;
    bool was_busy = false;
    int ret;

pr_info("%s(in_kthread=%i) IN\n", __func__, in_kthread);
    /* Lock queue */
    spin_lock_irqsave(&master->queue_lock, flags);

    /* Make sure we are not already running a message */
    if (master->cur_msg) {
        spin_unlock_irqrestore(&master->queue_lock, flags);
pr_info("%s(in_kthread=%i) OUT already running\n", __func__, in_kthread);
        return;
    }

But why is spi_finalize_current_message() kicking the thread before unmapping? I need a break. Will look at unmapping before kicking later.

msperl commented 9 years ago

So that may also be the reason why it fails completely for the RPI with the timeout. I know that 4.1 has some major optimizations on the spi-front, so this issue may have gotten solved... But a quick look at finalize_current_message in 4.1 and 4.0 shows the same sequence... queue pump_message and then umap_msg...

Can you check if moving the unmap before the spin_lock_irqsave(&master->queue_lock, flags); solves the issue?

We will need to raise this with Mark...

msperl commented 9 years ago

This should solve the issue:

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 57a1950..1e662a0 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1089,6 +1089,8 @@ void spi_finalize_current_message(struct spi_master *master)
    unsigned long flags;
    int ret;

+   spi_unmap_msg(master, master->cur_msg);
+
    spin_lock_irqsave(&master->queue_lock, flags);
    mesg = master->cur_msg;
    master->cur_msg = NULL;
@@ -1096,8 +1098,6 @@ void spi_finalize_current_message(struct spi_master *master)
    queue_kthread_work(&master->kworker, &master->pump_messages);
    spin_unlock_irqrestore(&master->queue_lock, flags);

-   spi_unmap_msg(master, mesg);
-
    if (master->cur_msg_prepared && master->unprepare_message) {
        ret = master->unprepare_message(master, mesg);
        if (ret) {

the issue does not longer show for me when this is applied...

notro commented 9 years ago

Yes it works for me too on Pi1. But is it ok to do this while interrupts are disabled? It might be other sideeffects also. Mark can probably answer.

msperl commented 9 years ago

If you look, I have corrected the patch to do it outside of the spinlock in the meantime (because I also saw the issue after posting it)...

Also I have just sent an email to the spi-list - let us see what the answer is, but I guess this will mean we will need to cherry-pick that patch to go into 4.0.

notro commented 9 years ago

Please add Suggested-by: Noralf Trønnes <noralf@tronnes.org> if you send it in. Or if the solution turns out to be something else: Reported-by: Noralf Trønnes <noralf@tronnes.org>

msperl commented 9 years ago

Slight variation - moving need to move it into to lock context again. (so essentially the first)

Will create and add them as soon as I find the time today...

msperl commented 9 years ago

well - testing the now proposed patch with feedback by Mark shows that I get issues with another driver that - this time - result in kernel panics, which I have not seen without this "patch".

So it may take slightly longer to fix...

msperl commented 9 years ago

@notro: sent an email to you with the patch to apply to fix the "free before unmap" issue. Please test and I will forward it upstream so that we can merge it into 4.0.

msperl commented 9 years ago

Sent patch upstream.

Closing this issue but I will open a separate issue for a merge from upstream as soon as it goes in.

raspberrypi / linux

4.0.y: strangeness with errors when using krealloc and later mapping the reallocated memory #959