intel / edison-linux

Other
48 stars 48 forks source link

spi_sync never ending on Edison with MCP2515 can controller #4

Closed jpilet closed 8 years ago

jpilet commented 8 years ago

Hi,

I am using an Intel Edison with a MCP2515 can controller. (my fork: https://github.com/jpilet/edison-linux/tree/edison-3.10.17-anemobox-mcp2515)

I configured the mcp251x driver to use the edison spi cs0. It works: I can "candump" the interface can0 for a while, and properly read and write from and to the bus.

However, after a few hours, the can interface stops working and dmesg tells me:

[ 5759.821413] INFO: task irq/304-mcp251x:521 blocked for more than 120 seconds.
[ 5759.821504] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5759.821579] irq/304-mcp251x D f5757a60  6544   521      2 0x00000000
[ 5759.821623]  f579bd64 00000046 c18673a6 f5757a60 f579bd20 c1c00000 f5cd8000 a2b704e8
[ 5759.821681]  0000043e f5757a60 f73fdc00 f5751bd0 f5757a60 f64eeb44 f6c0e000 f579bd2c
[ 5759.821735]  c127289f f6438900 f579bd34 c125ae1e f579bd50 c125c593 f6438900 00000007
[ 5759.821790] Call Trace:
[ 5759.821833]  [<c18673a6>] ? _raw_spin_unlock_irqrestore+0x26/0x60
[ 5759.821866]  [<c127289f>] ? wake_up_process+0x1f/0x40
[ 5759.821893]  [<c125ae1e>] ? wake_up_worker+0x1e/0x20
[ 5759.821919]  [<c125c593>] ? insert_work+0x53/0x90
[ 5759.821947]  [<c1867327>] ? _raw_spin_unlock+0x17/0x40
[ 5759.821973]  [<c125cf3f>] ? __queue_work+0x10f/0x340
[ 5759.822000]  [<c1866223>] schedule+0x23/0x60
[ 5759.822027]  [<c1863885>] schedule_timeout+0x165/0x2a0
[ 5759.822060]  [<c1270ebb>] ? get_parent_ip+0xb/0x40
[ 5759.822086]  [<c186adb5>] ? sub_preempt_count+0x95/0xf0
[ 5759.822112]  [<c1270ebb>] ? get_parent_ip+0xb/0x40
[ 5759.822140]  [<c18652db>] wait_for_completion+0xab/0xe0
[ 5759.822165]  [<c12728e0>] ? wake_up_state+0x20/0x20
[ 5759.822194]  [<c15b1b78>] __spi_sync+0x68/0xb0
[ 5759.822218]  [<c1860101>] ? panic+0xfe/0x178
[ 5759.822247]  [<c15b1bcf>] spi_sync+0xf/0x20
[ 5759.822279]  [<f8468134>] mcp251x_spi_trans+0x94/0xc0 [mcp251x]
[ 5759.822307]  [<c13215f4>] ? kmem_cache_alloc+0xd4/0x1b0
[ 5759.822338]  [<c15b1c00>] ? spi_sync_locked+0x20/0x20
[ 5759.822369]  [<f84686d2>] mcp251x_hw_rx+0x72/0x290 [mcp251x]
[ 5759.822402]  [<f84697c2>] mcp251x_can_ist+0x292/0x3b0 [mcp251x]
[ 5759.822429]  [<c18673fd>] ? _raw_spin_unlock_irq+0x1d/0x40
[ 5759.822459]  [<c12b6cf8>] irq_thread_fn+0x18/0x30
[ 5759.822486]  [<c12b71f0>] irq_thread+0x110/0x140
[ 5759.822512]  [<c12b6ce0>] ? irq_finalize_oneshot.part.29+0xb0/0xb0
[ 5759.822539]  [<c12b6da0>] ? wake_threads_waitq+0x50/0x50
[ 5759.822566]  [<c12b70e0>] ? irq_thread_check_affinity+0x70/0x70
[ 5759.822591]  [<c1263610>] kthread+0xa0/0xb0
[ 5759.822616]  [<c186adb5>] ? sub_preempt_count+0x95/0xf0
[ 5759.822648]  [<c186cd37>] ret_from_kernel_thread+0x1b/0x28
[ 5759.822672]  [<c1263570>] ? kthread_create_on_node+0xc0/0xc0

It seems "spi_sync" never returns. The bug occurs randomly after a few hours of correct behavior. Could it be a locking problem? Any idea?

I have seen @zhoushix's commits about SPI+DMA. However, if I enable dma for mcp151x or for spidev, the mcp151x driver can't access the controller (here https://github.com/jpilet/edison-linux/blob/9318ddd00bdc01a11f73ec6e7daeeb92aa241292/arch/x86/platform/intel-mid/device_libs/platform_mcp2515.c#L38 or here https://github.com/jpilet/edison-linux/blob/9318ddd00bdc01a11f73ec6e7daeeb92aa241292/arch/x86/platform/intel-mid/device_libs/platform_spidev.c#L29). Am I missing something, or is it simply that the SPI+DMA code is not yet ready?

Thanks in advance for your help

jpilet commented 8 years ago

I noticed that after the bug, the CPU usage is quite high. I profiled the kernel using perf, and I got the following results:

# Overhead          Command      Shared Object                              Symbol
# ........  ...............  .................  ..................................
# 
    48.77%     kworker/u4:2  [kernel.kallsyms]  [k] u8_writer
               |
               --- u8_writer
                  |
                  |--99.01%-- poll_writer
                  |          process_one_work
                  |          worker_thread
                  |          kthread
                  |          ret_from_kernel_thread
                  |
                   --0.99%-- process_one_work
                             worker_thread
                             kthread
                             ret_from_kernel_thread

    47.15%     kworker/u4:1  [kernel.kallsyms]  [k] u8_reader
               |
               --- u8_reader
                  |
                  |--96.54%-- handle_message
                  |          pump_messages
                  |          process_one_work
                  |          worker_thread
                  |          kthread
                  |          ret_from_kernel_thread
                  |
                   --3.46%-- pump_messages
                             process_one_work
                             worker_thread
                             kthread
                             ret_from_kernel_thread

     2.61%     kworker/u4:1  [kernel.kallsyms]  [k] handle_message

u8_writer and u8_reader are part of spi-pxa2xx (https://github.com/01org/edison-linux/blob/8cd9234c64c584432f6992fe944ca9e46ca8ea76/drivers/spi/spi-pxa2xx.c#L338) It seems that one thread is polling trying to write while the other one is polling trying to read. Without success. @westeri @jhnikula @andy-shev @broonie do you have any idea of what could put the SPI logic is such a locked state?

broonie commented 8 years ago

On Mon, Jan 11, 2016 at 12:27:58AM -0800, Julien Pilet wrote:

It seems that one thread is polling trying to write while the other one is polling trying to read. Without success. @westeri @jhnikula @andy-shev @broonie do you have any idea of what could put the SPI logic is such a locked state?

Please report any upstream issues via the mailing list (CCing relevant maintainers).

jpilet commented 8 years ago

which mailing list?

broonie commented 8 years ago

On Mon, Jan 11, 2016 at 06:50:19AM -0800, Julien Pilet wrote:

which mailing list?

SPI SUBSYSTEM M: Mark Brown broonie@kernel.org L: linux-spi@vger.kernel.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git Q: http://patchwork.kernel.org/project/spi-devel-general/list/ S: Maintained F: Documentation/spi/ F: drivers/spi/ F: include/linux/spi/ F: include/uapi/linux/spi/

jpilet commented 8 years ago

Thanks. My mistake. The problem is most probably not in pxa2xx.c but in drivers/spi/intel_mid_ssp_spi.c, which is not upstream.

andy-shev commented 8 years ago

@jpilet, I would suggest to try my https://github.com/andy-shev/linux/tree/eds branch and continue with upstream.

chenchux commented 8 years ago

https://github.com/01org/edison-linux/commits/edison-3.10.17

New patch has fixed SPI issue, please refer to commit above.

But user should notice that SPI clock frequency should be set above 1MHz, otherwise DMA will fail to transfer data.

In Yocto release 2, SPI transfer method is polling, and now it has switched to DMA method, and SPI transfer speed could improve a lot. If Clock Frequency lower than 1MHZ(MAX support up to 24MHZ), data transferring will be failed. (Actually, data transferring will be failed if set frequence lower than 800KHZ).

If increase working frequency to 1MHZ, the issue will be gone. //mraa_spi_frequency(spi,10000); mraa_spi_frequency(spi,1000000);

andy-shev commented 7 years ago

@chenchux In upstream kernel it's fixed. I guess you may transfer even with lower speeds if you want to.