hardbyte / python-can

The can package provides controller area network support for Python developers
https://python-can.readthedocs.io
GNU Lesser General Public License v3.0
1.29k stars 599 forks source link

socketcan Exchanging data between 2 bus - Time limited #377

Closed Bobet21 closed 4 years ago

Bobet21 commented 6 years ago

I am trying to exchange data between 2 bus - sending data from bus 1 to bus 2 - quit simple would you say? It all starts fine sending all message from bus 1 to bus 2 and the other way around, but then after about 500ms and 150 frames sent. The exchange stops....is there any reason why the code below would do so?

from multiprocessing import Process
import can

bus1 = can.interface.Bus(bustype='socketcan', channel='can0', bitrate=500000)
bus2 = can.interface.Bus(bustype='socketcan', channel='can2', bitrate=500000)
#

def loop_a():
    for msg in bus2:
            bus1.send(msg,timeout=None)

def loop_b():
    for msg in bus1:
            bus2.send(msg,timeout=None)

if __name__ == '__main__':
        Process(target=loop_a).start()
        Process(target=loop_b).start()
hardbyte commented 6 years ago

You want to keep your main Python process alive. Either by waiting until the sub-process is finished with join, or with an (interruptible) busy loop:

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        pass
    finally:
        bus.shutdown()
Bobet21 commented 6 years ago

Thanks! so now the following code runs just fine with the vcan - but when connecting to real hardware it stops again

Could it be related to buffering not managed properly? using this hardware http://linklayer.github.io/cantact/

from multiprocessing import Process
import can
import time

bus0= can.interface.Bus(bustype='socketcan', channel='can0', bitrate=500000, )
bus1 = can.interface.Bus(bustype='socketcan', channel='can1', bitrate=500000, )
#

def loop_a():
    for msg in bus0:
            bus1.send(msg,timeout=None)

def loop_b():
    for msg in bus1:
            bus0.send(msg,timeout=None)

if __name__ == '__main__':
    Process(target=loop_a).start()
    Process(target=loop_b).start()
    try:
        while True:
            time.sleep(1)            
    except KeyboardInterrupt:
        pass
    finally:
        bus0.shutdown()
        bus1.shutdown()
christiansandberg commented 6 years ago

You should only start the processes once, i.e. outside the loop. Also, you could use threads instead as the code is I/O bound and doesn’t require much CPU resources.

Bobet21 commented 6 years ago

Thanks Christian, I have updated the code above - except for using Threads which is just an optimization right ?) - the console returns the following messages. Could the timeout on self.recv() causing this (my bus2 is actually not running)

Reloaded modules: can.listener, can.util, can.io.asc, can.io.stdout, can.interfaces, can.io.blf, can.io.player, can.io.csv, can.interface, can.interfaces.socketcan.socketcan_common, can, can.io.log, can.message, can.bus, can.interfaces.socketcan.socketcan_ctypes, can.io.logger, can.io, can.interfaces.socketcan, can.io.sqlite, can.broadcastmanager, can.interfaces.socketcan.socketcan_native, can.interfaces.socketcan.socketcan_constants, can.notifier
Process Process-3:
Process Process-4:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
  File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
    self._target(*self._args, **self._kwargs)
  File "/home/louis/Dropbox/ProjetX/PROJECT-PROCESS/Python/9_CANx2BUS_V1_ALL.py", line 19, in loop_a
  File "/home/louis/Dropbox/ProjetX/PROJECT-PROCESS/Python/9_CANx2BUS_V1_ALL.py", line 25, in loop_b
    for msg in bus2:
  File "/home/louis/Coding/python-can-develop/can/bus.py", line 123, in __iter__
    bus2.send(msg,timeout=None)
    msg = self.recv(timeout=1.0)
  File "/home/louis/Coding/python-can-develop/can/interfaces/socketcan/socketcan_ctypes.py", line 147, in send
  File "/home/louis/Coding/python-can-develop/can/interfaces/socketcan/socketcan_ctypes.py", line 107, in recv
    bytes_sent = libc.write(self.socket, ctypes.byref(frame, total_sent), remaining)
    [], [], timeout)[0]) > 0:
KeyboardInterrupt
KeyboardInterrupt
christiansandberg commented 6 years ago

Processes are also much more complicated and if you don’t understand them fully you can get weird bugs like the one you have now. ;)

Bobet21 commented 6 years ago

Ok, so i made things simpler.... it does the same...it runs for while and returns a "Transmit buffer full" error from the socketcan.py

PS: I also tried threads @christiansandberg but it doesn't change anything


import can
busEV = can.interface.Bus(bustype='socketcan', channel='can2', bitrate=500000, )
busEM = can.interface.Bus(bustype='socketcan', channel='can1', bitrate=500000, )

while True:
    busEM.send(busEV.recv())
christiansandberg commented 6 years ago

Do you have any way of seeing if the messages are actually sent on the CAN bus? Have you set the correct bitrate when you brought up the interface? Are there other CAN nodes on the bus?

Bobet21 commented 6 years ago

1.Yes - using socketcan ifconfig we do see the Rx packets and Tx packets on each bus/network

  1. we double check and bitrate setup is ok
  2. There is another node on one bus to monitor the bus- a connection at that point actually sees the few sent packets before "Transmit buffer Full" error
christiansandberg commented 6 years ago

Can you see the bus load on the two can buses?

Bobet21 commented 6 years ago

We can only see it on one of the bus - the one we are extracting data from

christiansandberg commented 6 years ago

You could also try and set the timeout argument to the send method and see what happens.

Bobet21 commented 6 years ago

We tried it - it has not effect - still getting the error

christiansandberg commented 6 years ago

What’s the bus load on the source bus? Can you update to the latest version just to make sure we are on the same page. What timeout did you try?

Bobet21 commented 6 years ago
  1. on the source bus the load is: 1% (4 ID)
  2. we tried to send(,timeout=None) see below

from threading import Thread
import can
import time

busEV = can.interface.Bus(bustype='socketcan', channel='can2', bitrate=500000, )
busEM = can.interface.Bus(bustype='socketcan', channel='can1', bitrate=500000, )

while True:
    busEM.send(busEV.recv(),timeout=None)
christiansandberg commented 6 years ago

Oh right, you have to specify a time in seconds for send(). Put a few seconds to see if it can send anything at all.

Bobet21 commented 6 years ago

In those conditions it's definitely sending but after about 150 messages sent, it stops.

christiansandberg commented 6 years ago

Hmm, might check if the interface goes into some kind of error state or something. I don’t know how to do that with socketcan though.

Bobet21 commented 6 years ago

We may just doing things the otherway around - what would be the prefer method to exchange data between 2 buses? I am sure python can is able to do this? Rather than just debugging we would start from a best practice?

christiansandberg commented 6 years ago

I don’t think your script is the problem here. It seems to be on a lower level.

I would probably use the Notifer and RedirectReader classes to setup the transmission, but first you should try to get basic transmission to work. You could make it even more basic by just sending a message in a loop with a small sleep between each transmission.

Bobet21 commented 6 years ago

394 resolution definitely helps in exchanging data between 2 buses - i'll have another set of try

felixdivo commented 6 years ago

Is this still a problem or can this be closed?

Bobet21 commented 6 years ago

Yep, the latest version is not fixing this - i hope i could have deeper look in the future - but i believe we should keep this opened. I would use Python-Can to work as the main interface between two CAN buses - having my laptop converting data between the buses so that both "believe" they are working with the native ECUs.

tigershadowclaw commented 5 years ago

So from a hardware level "Transmit Buffer Full" means that there is nothing on the bus that is acking the messages that the device is trying to send meaning it will keep trying to send the first message and eventually the buffer that holds the other messages that are waiting to be sent will fill up, causing the error, and then the can adapter hardware will usually go into a "bus off" state if an error occurs. This state is cleared when you open the can bus because part of the hardware init sequence is to clear the TX and RX buffers.

So any real can bus that you are trying to send messages on must have at least one other device connected to it in order to listen to/ack the messages. The reason that it was working previously with the vcan interface is because the socketcan layer (which is what the vcan interface lives in) doesn't do any of the message acking stuff that the real can hardware adapters do.

Bobet21 commented 5 years ago

@shadowclaw , thanks, that seems to make sense from what i saw, however we tried to include a command busEV.flush_tx_buffer() but that didn't help either. Any idea how you would handle that ?

see our source code below

from threading import Thread
import can
import time

busEV = can.interface.Bus(bustype='socketcan', channel='can1', bitrate=500000)
busEM = can.interface.Bus(bustype='socketcan', channel='can0', bitrate=500000 )
#busEV = can.interfaces.serial.serial_can.SerialBus(channel='/dev/ttyACm0', baudrate=500000)
#busEM = can.interfaces.serial.serial_can.SerialBus(channel='/dev/ttyACm1', baudrate=500000)

def loop_a():
    for msg_a in busEM:
            busEV.send(msg_a,timeout=None)
            busEV.flush_tx_buffer()

def loop_b():
    for msg_b in busEV:
            busEM.send(msg_b,timeout=None)
            busEM.flush_tx_buffer()

if __name__ == '__main__':
    t1= Thread(target=loop_a)
    t2= Thread(target=loop_b)
    t1.setDaemon(True)
    t2.setDaemon(True)
    t1.start()
    t2.start()
#    t1.join()
#    t2.join()
    try:
        while True:
            time.sleep(1)            
    except KeyboardInterrupt:
        pass
    finally:
        busEM.shutdown()
        busEV.shutdown()
tigershadowclaw commented 5 years ago

flush_tx_buffer sounds like it should work but it depends on how it is implemented. I know with a pcan device it has a buffer is on the hardware itself so it is up to the PEAK driver as to whether or not that buffer is cleared when the socketCAN TX buffer is cleared. Unfortunately I haven't ever looked to deeply into either of these pieces so I don't know anymore than what I think it should be doing.

christiansandberg commented 5 years ago

Socketcan does not support flush_tx_buffer (that would require root privileges).

s1618 commented 5 years ago

I got the exact same problem. TX buffer full after a couple of messages. Everytime it happens I close the network and open again. @shadowclaw says it has something to do with ack messages, but having a listening node is suppose to change the ACK bit automatically. Is there an extra step?

My setup is a RPI sending messages and a PLC listening to it.

Bobet21 commented 5 years ago

well my setup is my PC (Ubuntu 18.04) receiving and sending messages - due to the frequency of the ECU/PLC sending messages...stop and start the network is not an option....that sounds trange to me as in my case on both buses i have ECU actively working, they may be not listening to all the messages though....would there be a path to ACK the message when we pass them from one bus to another?

tigershadowclaw commented 5 years ago

@s1618 assuming that the other device(s) are at the same buadrate, not in listen only mode, support the same can protocol (CAN 1.0, CAN 2.0A, CAN 2.0B, OpenCAN) then it should happen automatically at a hardware level.

For both @s1618 and @Bobet21 have you tried to send messages using the cangen utility? This would remove python-can from the question and help with debugging. There is also the candump utility which allows you to listen to messages on the bus.

s1618 commented 5 years ago

@shadowclaw I will try that next week. My RPI is connected to a certified controller for the mobile field industry.

Is there a tool I can use to verify the ack bit? I previously had my setup connected to a Teensy and I never had problems. The messages were going way faster than what I'm doing with the RPI now.

tigershadowclaw commented 5 years ago

The only thing I know of is something like this: https://www.peak-system.com/PCAN-Diag-2.231.0.html?&L=1. (or possibly this https://www.peak-system.com/PCAN-MiniDiag-FD.490.0.html?&L=1 but I haven't used one of these cheaper ones before)

Also you might want to verify that your can adapter on the RPi is being setup at the correct bitrate. At my office we use the /etc/network/interfaces file to define the bitrate of the bus when it is brought up by the linux network layer. Depending on what adapter you are using it might require different setup steps.

Example of initializing the bus 'can0' at 250K bitrate in the /etc/network/interfaces file

allow-hotplug can0
iface can0 can static
bitrate 250000
s1618 commented 5 years ago

the /etc/network/interfaces is exactly the same.

After some testing with can-utils, I can confirm that the bug is not from python-can as I get the exact same behavior.

Looking at can-dump I can see, if I understand how ACK works, that messages are never being sent twice. I did an increment program to verify it. My guess is that the ACK bit is not the problem.

Also, My txqueuelen is 1000 and I even tried more than that.

I'm using an MCP2151 with MCP2551, I don't think these are the problem too because I never had that problem with other devices (teensy/STM32/Atmega328p).

s1618 commented 5 years ago

@Bobet21

I was able to increase the reliability of the Canbus on the RPI.

My /boot/config.txt file before was :

dtparam=spi=on
dtoverlay=mcp2515-can0-overlay,oscillator=16000000,interrupt=25
dtoverlay=spi-bcm2835-overlay

but it was for an older version of the RPI

now my file is :

dtparam=spi=on
dtoverlay=mcp2515-can0,oscillator=16000000,interrupt=25
dtoverlay=spi1-1cs

it still crash after a period of time, but it crashes waaaay less.

My next test will be to change the MCP2515 hat I made for a chinese premade one to see if it's better.

s1618 commented 5 years ago

So after multiple testing on various setup, the problem really seems to be the ACK bit.

I know the MCP2515 has a one-shot mode to ignore ACK bit. But I can't seem to make it work.

the command sudo ip link set can0 type can one-shot on

returns RTNETLINK answers: Operation not supported

Any ways to make it work?

Also, is there a way to set this "one-shot" mode with python-can?

karlding commented 5 years ago

@s1618 are you sure the MCP2515 Linux driver you are using supports enabling One-Shot Mode as described in the MCP2515 datasheet? Looking at the in-tree kernel sources, it does not seem like an exposed feature that can be modified via netlink sockets.

Taken from https://elixir.bootlin.com/linux/latest/source/drivers/net/can/spi/mcp251x.c#L1068

    priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES |
        CAN_CTRLMODE_LOOPBACK | CAN_CTRLMODE_LISTENONLY;

And the struct definition for can_priv is taken from https://elixir.bootlin.com/linux/latest/source/include/linux/can/dev.h#L56

struct can_priv {
    // ...
    /* CAN controller features - see include/uapi/linux/can/netlink.h */
    u32 ctrlmode;       /* current options setting */
    u32 ctrlmode_supported; /* options that can be modified by netlink */
    u32 ctrlmode_static;    /* static enabled options for driver/hardware */
    // ..
};

This is probably because the same SPI driver is used to support the MCP2510 and the MCP2515, and according to Microchip's Upgrade Notes, the MCP2510 doesn't support One-Shot Mode.

Perhaps you're building your own kernel with your own cherry-picks, in which case maybe this isn't relevant.

s1618 commented 5 years ago

@karlding You may be right on that. I didn't check the file, my bad (I don't have a lot of experience with linux in general). I may add the simple function to write the one-shot byte to the register.

Thank you!

karlding commented 4 years ago

Closing this out as there doesn't seem to be anything actionable here. Please reopen if necessary.