linux-can / can-utils

Linux-CAN / SocketCAN user space applications
2.36k stars 708 forks source link

CAN not working with no error message or "No buffer space available" #262

Open VictorBarth opened 3 years ago

VictorBarth commented 3 years ago

Hi everyone, I've been using can for over a year, but now it stopped working. My setup is a raspberry pi 3B+ with pican2 and EPOS2 70/10 for a maxon motor.

I haven't changed anything in the code and tried old codes that was working before. I also bought a second pican2 in case of the first one was broken, but nothing again. When I try to run candump with can0 I receive no error message, but it works for vcan0. I also tried with different SD cards (my backup that was working) and new raspberry and nothing again.

I checked /boot/config.txt:

dtoverlay=mcp2515-can0,oscillator=16000000,interrupt=25
dtoverlay=spi-bcm2835-overlay

and /etc/networking/interfaces :

auto can0
iface can0 inet manual
        pre-up /sbin/ip link set can0 type can bitrate 500000 triple sampling$
        up /sbin/ifconfig can0 up
        down /sbin/ifconfig can0 down

My final test was making a new SD card changing the OS (I was using raspex, and this new one uses raspberry pi OS), but now I receive two different error messages:

write: No buffer space available

but after a few minutes it changes to

socket: Too many open files

In both cases the result for ifconfig is:

can0: flags=129<UP,NOARP>  mtu 16
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 10  (UNSPEC)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I dont know where I can check something else to found what is wrong here, I would be grateful if someone could help me.

marckleinebudde commented 3 years ago

write: No buffer space available

This means you CAN controller fails to send CAN frames, this is consistent with the ifconfig output. Make sure you have a working CAN bus, proper Termination, same bitrate, at least two CAN stations.

socket: Too many open files

Have a look the program you're using to send, it probably opens a new socket after write() returns an error, but doesn't close the open one.

VictorBarth commented 3 years ago

nvm, The problem was easier than I thought. I just changed directly at ifconfig:

ifconfig can0 txqueuelen 1000

When I did it even the socket: Too many open files message stopped

But the old SD card is still with a problem and I have no idea why.. The txqueuelen is ok, my code is ok, my rasp +CANbus are working and no error messages.. At least I can keep working in this new SD card.

Thx for your reply

marckleinebudde commented 3 years ago

As I told before, my code was working properly before, I'm sure I'm closing every socket I open, this is my code:

  /* send frame */
  if ((nbytes = write(s, &frame, sizeof(frame))) != sizeof(frame)) {
      perror("write");

Who is closing the socket here?

      return 1;
  }

sudo ip link set can0 up type can bitrate 1000000 pre-up /sbin/ip link set can0 type can bitrate 500000 triple sampling$

Please paste the output of ip -details -statistic link show can0

marckleinebudde commented 3 years ago

nvm, The problem was easier than I thought. I just changed directly at ifconfig:

ifconfig can0 txqueuelen 1000

This way you have a longer queue on the CAN interface, making your send() block instead of returning -ENOBUFS.

When I did it even the socket: Too many open files message stopped

...as you don't do a return 1; without closing the socket.

VictorBarth commented 3 years ago

I open a can raw socket, and close it in the end of the function

close(s)

I was sending a 16 bytes lenght in a block, as ifconfig was with 10 bits it was impossible.

About triple-sampling I'm really not sure, I've seen it as suggestion in one forum, and I'm not sure how it works

Which bitrate do plan to use?

it was an error when I did ctrl +c ctrl +v after some tests.. Sorry about it. I'm using 1Mbit, The $ was just because the code doesn't fit in the window of my terminal, so what I sent was incomplete

marckleinebudde commented 3 years ago

I open a can raw socket, and close it in the end of the function

close(s)

Please look at you code again, in case of an error during write() you return without closing the socket.

I was sending a 16 bytes lenght in a block, as ifconfig was with 10 bits it was impossible.

You always have to send a complete struct can_frame.

Don't use triple sampling for now.

I suggest to look at the output of ip -details -statistic link show can0 to spot the difference between your working and non working version.

VictorBarth commented 3 years ago

The output of ip -details -statistic link show can0 is:

4: can0: <NO-CARRIER,NOARP,UP,ECHO> mtu 16 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can state BUS-OFF restart-ms 0
          bitrate 1000000 sample-point 0.750
          tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
          mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
          clock 8000000
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          0          0          1          1          1         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0

I also included close(s) even in case of an error during write()

marckleinebudde commented 3 years ago

This is looks like the non working version.

Can you post the same output for the working version?

VictorBarth commented 3 years ago

The working version stopped working now when I disconected and reconected the can bus. I tried with 3 different Pican2 and two MCP2515 modules and its not working anymore

marckleinebudde commented 3 years ago

The working version stopped working now when I disconnected and reconnected the can bus.

You mean disconnect and reconnect while the CAN bus is active and you are sending data? First try to send and receive a single frame over a proper CAN bus. If this is working you can play more advanced games like disconnecting from the bus, etc :smile:

VictorBarth commented 3 years ago

This is the send code:

int canSend(char* code){    
    int s; /* can raw socket */ 
    int nbytes;
    struct sockaddr_can addr;
    struct can_frame frame;
    struct ifreq ifr;

    if (parse_canframe(code, &frame)){
        printf("nope\n");       
        return 1;
    }

    /* open socket */
    if ((s = socket(PF_CAN, SOCK_RAW, CAN_RAW)) < 0) {
        perror("socket");
        return 1;
    }   

    addr.can_family = AF_CAN;

    strcpy(ifr.ifr_name, "can0");
    if (ioctl(s, SIOCGIFINDEX, &ifr) < 0) {
        perror("SIOCGIFINDEX");
        return 1;
    }
    addr.can_ifindex = ifr.ifr_ifindex;

    /* disable default receive filter on this RAW socket */
    /* This is obsolete as we do not read from the socket at all, but for */
    /* this reason we can remove the receive list in the Kernel to save a */
    /* little (really a very little!) CPU usage.                          */
    setsockopt(s, SOL_CAN_RAW, CAN_RAW_FILTER, NULL, 0);

    if (bind(s, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        perror("bind");
        return 1;
    }

    /* send frame */
    if ((nbytes = write(s, &frame, sizeof(frame))) != sizeof(frame)) {
        perror("write");
        close(s);
        return 1;
    }

    //fprint_long_canframe(stdout, &frame, "\n", 0);

    close(s);
    return 0;
}

void buildMsg(unsigned char **data, unsigned char **msg){
    //unsigned char *slave = "123#";
    *msg = malloc(strlen(slaveAdress)+1+strlen(*data)); 
    strcpy(*msg, slaveAdress); 
    strcat(*msg, *data);    
}

And my main:

nt main(int argc, char **argv){ 
    fd_set rdfs;
    int s;
    struct ifreq ifr;
    struct sockaddr_can addr;
    struct can_frame frame;

    memset(&ifr, 0x0, sizeof(ifr));
    memset(&addr, 0x0, sizeof(addr));
    memset(&frame, 0x0, sizeof(frame));

    //CAN init
    s = socket(PF_CAN, SOCK_RAW, CAN_RAW);// open CAN_RAW socket 
    strcpy(ifr.ifr_name, "can0");// convert interface sting "can0" into interface index 
    ioctl(s, SIOCGIFINDEX, &ifr);
    addr.can_ifindex = ifr.ifr_ifindex; // setup address for bind 
    addr.can_family = AF_CAN;
    bind(s, (struct sockaddr *)&addr, sizeof(addr));    // bind socket to the can0 interface 

    struct timeval tv;
    int rc;

    //int flag = 0;

    while (running) {
        FD_ZERO(&rdfs);     
        FD_SET(s, &rdfs);

        tv.tv_sec = 0;
        tv.tv_usec = 1000; // 1000 microseconds -> 1kHz 

        rc = select(s+1, &rdfs, NULL, NULL, &tv); //rc == 0 - timeout

        if (!rc) {
                 //do some things
            if (FD_ISSET(s, &rdfs)) {   
                    read(s, &frame, sizeof(frame));
                //fprint_long_canframe(stdout, &frame, NULL, 0);                
                //printf("\n"); 
                if(frame.can_id == 0x701) deCode(frame, 0);

            }
        out_fflush:
            fflush(stdout);
    }
    close(s);
    return 0;
}

I deleted some parts of my code just ignoring what is not about CAN.

VictorBarth commented 3 years ago

No, I was testing all my CAN shields, the first one was working properly, as I told you, then I sudo poweroff my device, plugged my SD card with the OS and this code installed in another device, and tried again. When I came back to the first device it was not working anymore.

VictorBarth commented 3 years ago

Now it is working again, and the output for ip -details -statistic link show can0 is:

4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000       
              link/can  promiscuity 0 minmtu 0 maxmtu 0
              can state ERROR-ACTIVE restart-ms 0                                                                                           
                              bitrate 1000000 sample-point 0.750                                      
                              tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1                                                      
                              mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1                                      
                              clock 8000000                                                                                                          
                              re-started bus-errors arbit-lost error-warn error-pass bus-off                                                 
                               0          0          0          1          2          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535                                                                           
              RX: bytes  packets  errors  dropped overrun mcast                                                      
              3881       486      0       0       0       0                                                           
              TX: bytes  packets  errors  dropped carrier collsns                                                      
              3872       487      0       0       0       0       

The only thing I did was changing from windows powershell to linux terminal to open with ssh

marckleinebudde commented 3 years ago

I'm a bit lost now. Can you sum up what's working and what not.

VictorBarth commented 3 years ago

it seems that the order in which I turn on the devices makes it work or not. If I turn on raspberry first, it works. But if I turn on EPOS first it doesn't work. The strange thing is that it didn't happen before. I'll try again with the SD card that stopped working.

Rockstein2 commented 2 years ago

Hello guys, I am running three tcan4550 chips on spi0, spi4 and spi6 of rpi4 and I am experiencing similar behaviour if I send the messages to fast. This runs for a while, then suddenly write to the socket return Errno 105. This state remains until ifdown / ifup can interface. If I try with cansend, cansend says the same "write: No buffer space available". Is it possible do debug this behaivor?

nico0481 commented 2 years ago

Could you try

ifconfig canX txqueuelen 2000 or ip link set canX txqueuelen 2000

With canX X your actual can interface.

Rockstein2 commented 2 years ago

Yes, I did it. It delays the error for 1-2 minutes. No I try to ftrace it. I see, that there is no traffic on spi in this state. No functions from m_can and tcan4x5x.c called. from /drivers/net/can/raw.c raw_sendmsg ->can_send in af_can.c but

marckleinebudde commented 2 years ago

Let's discuss this on the linux-can mailing list: linux-can@vger.kernel.org Make sure you're using the latest kernel. Can you send me the DT overlay you're using.

cvetaevvitaliy commented 2 years ago

/etc/network/interfaces add string after pre-up

post-up /sbin/ip link set canX txqueuelen 1000

where canX you can interface

Rockstein2 commented 2 years ago

is it not the same as with ifconfig canX txqueuelen 2000?

marckleinebudde commented 2 years ago

is it not the same as with ifconfig canX txqueuelen 2000?

Yes, but ifconfig is a deprecated tool, and 2000 is not the same as 1000. :smile:

cvetaevvitaliy commented 2 years ago

is it not the same as with ifconfig canX txqueuelen 2000?

this is the same, but you do not need to write every time after rebooting :)

Rockstein2 commented 2 years ago

:))) That does not matter here. At that time, I was concerned with the handling of ISR requests in the drivers' m_can_isr routine. Only the flags for m_can ver3.0 were reset, also in case of v3.1 (tcan4x5x)

https://lore.kernel.org/all/b5066414-fb63-71af-997c-07c1c531a218@photo-meter.com/

VictorBarth commented 1 year ago

Hi everyone, I'm experiencing almost the same issue again. And I'm a bit confused again. I was using the maxon driver EPOS2 70/10 to drive a many motors and it was working well. Now I have an EPOS4 70/15 which have exact the same registers and works using arduino but dont with raspberry. I receive the errors "CAN overrun error (object lost)" with the description "CAN mailbox experienced an overrun due to high communication rate". "CAN Rx queue overflow" with the description "CAN receive queue overflow due to high communication rate" The problem is that I can check the "automatic bit rate" and its the same as configured on rasp. Other thing that makes me confuse is that the setup is configuring correctly the driver and the problem only occurrs in the loop.