linux-can / can-utils

Linux-CAN / SocketCAN user space applications
2.29k stars 698 forks source link

Repeated transmission of can frames (within a while loop) is not working #431

Closed anandbhavi closed 1 year ago

anandbhavi commented 1 year ago

Hi Team,

We are using Linux mcp 251xfd kernel driver on Xilinx SOC XAZU3 based SOC . The frames are sent from mcp chip on our board (mcp chip 2517fd connected to our SOC via SPI controller ) to CANananlyser node and can frames can be seen on CAnanalyser software connected to windows pc;

We are observing that after 4 times of sending the frames, we dont see any more frames on CAN Analyser software.

Please find the can user space application attached and Screen shots of CANAnalyser pro attached; also attached are our system.dts .

Please let us know how to debug this issue;; Thanks in Advance

mcp kernel drivers link https://github.com/Xilinx/linux-xlnx/tree/xlnx_rebase_v5.10/drivers/net/can/spi/mcp251xfd

mcp_atatchments.zip

anandbhavi commented 1 year ago

ip_stats Attaching the ip link stats; The count of Tx Bytes is not increasing inspite of transmission working 4 times .Please let us know the reason:

anandbhavi commented 1 year ago

Hi Team, We have already tested the mcp chip on Xilinx SDK where in we don't have linux drivers. These mcp drivers are provided by Microchip and are part of Xilinx SDK ; Here we dont see this issue of transmission stopping after 4 times; Please note this point

marckleinebudde commented 1 year ago
anandbhavi commented 1 year ago

Can you provide the device tree source?

Answer: Attached are the system.dts and system-user.dtsi ( this has info on mcp can node ) system.dts is the final dts file generated from flashed dtb file ;

Can you receive CAN frames?

Answer: We have not tested receive part ; We planned to test the Transmission Part first , later reception;

What's the output of cat /proc/interrupts before and after you've send a single CAN frame

Answer: Log attached:

In the log we can see that count of ff040000.spi interrupts is increasing by 2 for every call of hello-gen2-can application; But after 4 transmissions, the count is stuck at 82

 48:         74          0          0          0     GICv2  51 Level     ff040000.spi
 48:         76          0          0          0     GICv2  51 Level     ff040000.spi
 48:         78          0          0          0     GICv2  51 Level     ff040000.spi
....
 48:         82          0          0          0     GICv2  51 Level     ff040000.spi

This indicates that the write User space call fails to trigger spi-core and spi controller driver to issue spi messages thereafter. Please look into the issue; New folder.zip

marckleinebudde commented 1 year ago

The IRQs of interest is not only the ff040000.spi, but also the spi1.0. The ff040000.spi is the SPI host controller, that is triggered for SPI transfers, the spi1.0 is the IRQ from the mcp251xfd to the SoC.

According to your attached file spi1.0 stays 0. This means the IRQ line is not connected, or misconfigured in your DT. Have a look at the bindings example how to configure it.

It should look like this:

&spi0 {
    status = "okay";

    can@2 {
        compatible = "microchip,mcp2517fd";
        reg = <0>;
        clocks = <&can_osc>;
        interrupts-extended = <&gpio 0x10 IRQ_TYPE_LEVEL_LOW>;
        //microchip,rx-int-gpios = <&gpio 39 GPIO_ACTIVE_LOW>;
        spi-max-frequency = <20000000>;
    };
};

BTW: You will not have much fun with the mcp2517fd, that chip is quite broken, further the driver in the v5.10 kernel is quite old and slow.

anandbhavi commented 1 year ago

Thanks for your reply !

We used the above mentioned DT snippet in our system-user.dtsi ; Unfortunately there is no progress Marc;

;We also verified the &gpio 0x10 is the correct IRQ; Attached is the pic of UG 1085 document pic for zynq MP SOC; we have to subtract 32 from 48 which gives hex 0x10 as the Hardware irq number which is given for the request irq API; Please see S1 pic;

But Still we see spi1.0 interrupts as zero; I believe you are mentioning that IRQ line should be asserted atleast for Tx interrupt since we commented microchip,rx-int-gpios ; let us know; (Please see S2 pic ) ;

Other Observations:

1) We observed that mcp251xfd_start_xmit function is called 4 times for our 4 invocations of our user space application (we put print in mcp251xfd_start_xmit function and removed while(1) loop ) and for further invocations of user space application the function mcp251xfd_start_xmit is not getting called; This is strange considering that we can see error free 4 frames on cananylser window; / 2) Does it indicate we have missed setting any socket options on this net socket or any issues with Socket net CAN framework which we missed;

This is high priority task ; Please give us any pointers to proceed further Marc;

latest.zip

marckleinebudde commented 1 year ago

Hello @anandbhavi,

this community help and thus best effort. If you need commercial support, drop me an e-mail.

That said, there is nothing wrong with the user space (i.e. socket options...). The TX complete interrupt from the controller is not getting into the driver, so the driver sends up to 4 CAN frames, then it stops as (from the driver's point of view) all buffers are still in use.

We also verified the &gpio 0x10 is the correct IRQ; Attached is the pic of UG 1085 document pic for zynq MP SOC, we have to subtract 32 from 48 which gives hex 0x10 as the Hardware irq number which is given for the request irq API; Please see S1 pic;

I still doubt that 0x10 is the correct IRQ. According to your screen shot 0x10 corresponds to the GPIO controller as a whole. GPIO controllers usually have 32 GPIOs, you have to specify which GPIO you have connected the nINT pin to.

regard, Marc

anandbhavi commented 1 year ago

Hi Marc,

Thanks for Your reply; I would like to go for Commercial Support; Please let me know further details;

marckleinebudde commented 1 year ago

Please contact sales@pengutronix.de

anandbhavi commented 1 year ago

Marc;

We could resolve the gpio issue with correct entries in dtsi file. We are receiving mcp interrupts on our soc.

And our user space application is working as expected.

Thank you.

Another doubt: We have 2 mcps on our board. On one of the mcp we see cerrif handler getting generated at the first transmission from soc to mcp itself .the berr counter is 128 and further communication stops.

we dont have any issues eith another mcp with linux driver..the transmit and receive hanflers are getting executed properly .And also, the baremetal drivers are executing properly on both mcps .

Let us know your thoughts.

Thank you

Regards Anand

On Thu, 18 May 2023, 15:00 Marc Kleine-Budde, @.***> wrote:

Hello @anandbhavi https://github.com/anandbhavi,

this community help and thus best effort. If you need commercial support, drop me an e-mail.

That said, there is nothing wrong with the user space (i.e. socket options...). The TX complete interrupt from the controller is not getting into the driver, so the driver sends up to 4 CAN frames, then it stops as (from the driver's point of view) all buffers are still in use.

We also verified the &gpio 0x10 is the correct IRQ; Attached is the pic of UG 1085 document pic for zynq MP SOC, we have to subtract 32 from 48 which gives hex 0x10 as the Hardware irq number which is given for the request irq API; Please see S1 pic;

I still doubt that 0x10 is the correct IRQ. According to your screen shot 0x10 corresponds to the GPIO controller as a whole. GPIO controllers usually have 32 GPIOs, you have to specify which GPIO the you have connected the nINT pin to.

regard, Marc

— Reply to this email directly, view it on GitHub https://github.com/linux-can/can-utils/issues/431#issuecomment-1552789292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQQAW3SYDMRCDZFLWQXTR7TXGXT4LANCNFSM6AAAAAAYC6MIU4 . You are receiving this because you were mentioned.Message ID: @.***>

marckleinebudde commented 1 year ago

We have 2 mcps on our board. On one of the mcp we see cerrif handler getting generated at the first transmission from soc to mcp itself .the berr counter is 128 and further communication stops.

There's a problem with the CAN transceiver or on the CAN bus, i.e. transceiver not powered, rx/tx swapped, not attached to the CAN bus, wrong bit rate, no/wrong termination, no 2nd system on the CAN bus.

anandbhavi commented 1 year ago

Hi Marc,

All our issues have been addressed internally. There is no need of Commercial Support as of Now. Thanks again for your timely Support .

I will close this Issue.

regards Anand B