linux-can / can-utils

Linux-CAN / SocketCAN user space applications
2.43k stars 714 forks source link

Cannot clear TX Buffer of CAN. #514

Open MattJmt opened 7 months ago

MattJmt commented 7 months ago

Hello,

I am using a 2-Channel Isolated CAN Expansion HAT to control two DC Motors via a Raspberry Pi 4B. I am using python-can 4.3.1.

When starting the python script, I receive the "Transmit Buffer Full" Error from can/interfaces/socketcan/socketcan.py.

However, if successfully running cangen can0 and candump can0 in the terminal before running the python script, the CAN works perfectly. It sometimes happens that running cangen can0 and candump can0 yields write: no buffer space available, in which case the CAN doesn't work. Attempting to change the CAN interface settings with functions such as sudo ifconfig can0 txqueuelen 1000 results in these errors RNETLINK answers: Invalid argument. The can0 is initialised with txqueuelen = 65536, so I don't know why I get these errors, which seem to be very random.

I checked the following threads, without any success:

I asked the manufacturer for advice, who recommended to add bus.flush_tx_buffer(). This didn't change anything.

Continuing to troubleshoot the issue, I observe that when I turn the Raspberry Pi on, running ifconfig indicates no issues. However running cangen can0, which results in the "write: No buffer space available" error message, I observe a significant amount of TX errors, as depicted below:

image

I have tried changing the bitrate, turning the can0 interface down then up, still doesn't resolve the issue. The only solution which works after multiple attempts is to reboot the Raspberry Pi, for which the TX errors seem to halve after every reboot (until there are non left and I can run the python script)

Below is a screenshot when running ip -det link show can0 image

This is the code to put the CAN Bus up at boot: sudo nano /etc/network/interfaces

auto can0
iface can0 inet manual
pre-up /sbin/ip link set can0 up type can bitrate 1000000 dbitrate 8000000 restart-ms 1000 berr-reporting on fd on
up /sbin/ifconfig can0 up
down /sbin/ifconfig can0 down
post-up /sbin/ip link set can0 txqueuelen 65536

Any help is fully appreciated!

marckleinebudde commented 7 months ago

In both of your outputs (ifconfig and ip) the txqueuelen is 10, to proper set the txqueuelen use:

/sbin/ip link set can0 txqueuelen 1000 up type can bitrate 1000000 dbitrate 8000000 restart-ms 1000 berr-reporting on fd on

Are you actually using CAN-FD with a data bitrate of 8 Mbit/s?

The TX errors will not "disappear" as they are cumulative counters. Completely removing the device driver or a reboot will.

You have enabled bit error reporting, use candump any,0~0,#FFFFFFFF -cexdtA to show and decode any bit errors on the bus.

If your devices goes into ERROR_PASSIVE there's something wrong on the bus. Can you send me a candump any,0:0,#FFFFFFFF -cexdtA of normal traffic, when your python code is running and communicating the the drives. Which CAN-IDs are you using to communicate with the drives? Which CAN-ID are they using to answer back?

MattJmt commented 7 months ago

I modified the /etc/network/interfaces as you suggested:

auto can0
iface can0 inet manual
        pre-up /sbin/ip link set can0 down
        pre-up /sbin/ip link set can0 txqueuelen 1000 up type can bitrate 1000000 dbitrate 8000000 restart-ms 1000 berr-rep>
        up /sbin/ip link set can0 up
        down /sbin/ip link set can0 down

Running ifconfig at boot now correctly shows image

Trying cangen candump leads to the same error message of "write : No buffer space available". This was with the Power Supply on for the motors. However, after rebooting, this time turning on the motor's power supply only once the RPi is on, cangen candump works (despite the TX errors).

image

Was a txqueuelen = 65536 too high?

For setting the bitrate to 8 MBit/s I followed what was indicated on their setting up page.

This is my candump can0 when running the python script (when it works):

image

These are the codes I use to communicate with the CAN:

class TMOTOR_SPECIAL_CAN_CODES:
    ENABLE_CAN_DATA  = [0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFC] # Enables the motor
    DISABLE_CAN_DATA = [0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFD] # Disables the motor
    ZERO_CAN_DATA    = [0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFE] # Sets current position to zero

The CAN_IDs I use for the motors are 3 and 4. For the bus communication:

        filters = [{"can_id": 0x000, "can_mask": 0x7FF, "extended": False}]
        self.bus = Bus(channel="can0", bustype="socketcan", can_filters=filters, bitrate=1000000)

        filters = [{"can_id": 0x003, "can_mask": 0x7FF, "extended": False}]
        self.bus = Bus(channel="can0", bustype="socketcan", can_filters=filters, bitrate=1000000)
marckleinebudde commented 7 months ago

Was a txqueuelen = 65536 too high?

No, don't know why this doesn't for for you.

For setting the bitrate to 8 MBit/s I followed what was indicated on their setting up page.

You should look at the datasheet of your drive, if they use CAN or CAN-FD and configure you raspi accordingly.

Can you send me a candump any,0:0,#FFFFFFFF -cexdtA of normal traffic, when your python code is running and communicating the the drives.

There's CAN-ID 000, 003 and 004 on this bus. Please tell me who is sending these CAN-IDs.

MattJmt commented 7 months ago

This is what I get from candump any,0:0,#FFFFFFFF -cexdtA

image

The motors being different versions, one has CAN_ID = 3 / 0x003 in the filters, whilst the other has CAN_ID = 4 / 0x000 in the filters.

marckleinebudde commented 7 months ago

I see CAN-ID 003 as RX and TX. Does doesn't look good. You cannot send the same CAN-ID from different CAN controllers.

MattJmt commented 7 months ago

That is simply the way this motor is setup... I will see if I can change it but not sure.

grvstick commented 4 months ago

This may not be related, but I had a similar experience; tx buffer being full. The workaround was to reload the CAN module by modprobe command, or use ip down and ip up commands to refresh the socket can.

This case was reproduceable when the target devices are left disconnected for a while where the socket can device periodically tries to check the devices via sending a packet.

nefethael commented 1 month ago

Hello,

In our case, we have "Transmit buffer full" error with USB->CAN adaptators using gs_usb driver (Innomaker and UCCB). Problem disappears using PEAK cable (peak_usb).

Note that we didn't have this issue on 5.15 kernel with any cable only with 6.6. I don't know if it's related to Candlelight firmware or something else, hope it can help someone ;)

Regards, Vincent

marckleinebudde commented 1 month ago

@nefethael If you still have this issue and want help to resolve it, please open a new issue. The OP's problem is unrelated to a kernel update.

marckleinebudde commented 1 month ago

@grvstick If you try to send to a CAN bus without a 2nd device, the sender will continuously retry to send. Eventually all TX buffers are full. It's a known limitation of the CAN implementation in Linux and the candlelight firmware that you cannot abort a single CAN message "stuck" in the TX queue.