pierremolinaro / acan2517

Arduino CAN driver for MCP2517FD CAN Controller (in CAN 2.0B mode)
MIT License
22 stars 10 forks source link

CAN-transmission errors #17

Open tomtom0707 opened 2 years ago

tomtom0707 commented 2 years ago

Hello Pierre, but once again about my problem with the transmission errors, because it may not be (only) because of the SPI transmission rate. For this purpose I examined my "good" circuit, which did not generate any errors, i.e. the one with the 20MHz crystal on the MCP and the SPI transmission with 10MHz generated with it. The peculiarity here is that one message is specifically sent in a time interval

But that is just a deceptive "good condition". Because if a second device sends data on the bus and the bus load increases again, to around 60%, then these errors occur again.

Do you have any idea? So when the MCP gets stressed, errors occur on the bus. Which settings can you adjust? Thanks.

tomtom0707 commented 2 years ago

Maybe the following information about it. Now I have the entire system running with your ESP32-ACAN. So I use the internal CAN controller of the ESP, the hardware from the CAN driver (MCP2558) on the board and the desk is the same. The system runs perfectly in all variants, even with a bus load of 100%, no errors occur.

setup acanESP32: ACAN_ESP32_Settings settings (canbaudini) ; settings.mRequestedCANMode = ACAN_ESP32_Settings::NormalMode ; settings.mRxPin = GPIO_NUM_27 ; // Optional, default Tx pin is GPIO_NUM_4 settings.mTxPin = GPIO_NUM_26 ; // Optional, default Rx pin is GPIO_NUM_5 const ACAN_ESP32_Filter filter = ACAN_ESP32_Filter::singleStandardFilter (ACAN_ESP32_Filter::data, 0x07f, 0x07f) ; const uint32_t errorCode = ACAN_ESP32::can.begin (settings, filter);

setup acan2517: ACAN2517 can (MCP2517_CS, vspi, 255) ; // no interrupt pin

ACAN2517Settings settings (ACAN2517Settings::OSC_20MHz,canbaudini) ; // 20 MHz-Quarz settings.mRequestedMode = ACAN2517Settings::Normal20B ; // normal 2.0B-Mode ACAN2517Filters filters; filters.appendFilter (kStandard, 0x780, 0x000, NULL) ; // alle <0x80 ID durchlassen if (filters.filterStatus () != ACAN2517Filters::kFiltersOk) Serial.println ("!! CAN-Filter error !!"); else Serial.println ("CAN-Rec.Filter: < 0x80"); const uint32_t errorCode = can.begin (settings,NULL,filters) ; // GO! kein Interrupt, mit Filter!

I made the attempt and expanded the TX transmit buffer size ... but also used the TXQ buffer, it has brought me to ekinem success so far.

pierremolinaro commented 2 years ago

This behavior is surprising, can you send me your sketch for testing?

Pierre

Le 8 janv. 2022 à 02:21, tomtom0707 @.***> a écrit :

Hello Pierre, but once again about my problem with the transmission errors, because it may not be (only) because of the SPI transmission rate. For this purpose I examined my "good" circuit, which did not generate any errors, i.e. the one with the 20MHz crystal on the MCP and the SPI transmission with 10MHz generated with it. The peculiarity here is that one message is specifically sent in a time interval

Experimental setup 1: ESP32 and MCP2518 with 20MHz crystal, SPI, no interrupt pin, no data is received in the ESP32, it is only sent. I am sending a message with 6 bytes of data at a frequency of 4 KHz. The CAN runs at 1MBaud, resulting in a bus load of approx. 41%. No other participant on the bus sends further data. Everything works fine, over 10 million messages sent, no errors. Experiment setup 2: Exactly the same as 1, the following change: the CAN now runs with 500kBaud. This leads to a bus load of approx. 82%. Now errors occur, remote frames are received and also messages that were never sent. (approx. 1 error per 10000 transmissions) The stress with the higher bus load is only caused by the MCP, the SPI transmission does not change. He only sends alone, so he doesn't have to fight for priorities. I can't explain why ... Experiment setup 3: Exactly the same as 1, the following change: 4 different messages are sent in the program from the ESP to the MCP in direct succession. The transmission frequency for these 4 messages is 1KHz, so the CAN bus load is again at 40%. Now errors arise: Specifically sent: CAN-ID(hex): 320, 321, 322, 323 - 45000 each, received: 320 and 321 each 45000, 322: 44992, 323: 44110, additionally the following messages and number on ID(hex): 302: 8; 303: 890 It is noticeable here that the last 2 messages are badly affected. I have solved the problem by waiting 70µs after each sending of a message before I send the next one. Then it runs without errors. I can't really explain that to myself, because I'm writing in the buffer, but that's how it is. But that is just a deceptive "good condition". Because if a second device sends data on the bus and the bus load increases again, to around 60%, then these errors occur again.

Do you have any idea? So when the MCP gets stressed, errors occur on the bus. Which settings can you adjust? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/pierremolinaro/acan2517/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWKZVCEI7AJ5P63SA2CMJ3UU6GQNANCNFSM5LQAAAUQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.

tomtom0707 commented 2 years ago

Hello Pierre, I have written a small program where I can switch between the two CAN controllers with the "ESP_CAN" switch. The MCP is supplied via 4MHz, PLLx10. This was also a suspected fact that previously led to "stress errors" in the MCP.

 ACAN2517Settings settings (ACAN2517Settings::OSC_4MHz10xPLL,canbaudini) ;  // 4MHz-Takt, 40MHz-System
 settings.mRequestedMode = ACAN2517Settings::Normal20B ;   // normal 2.0B-Mode
 settings.mDriverReceiveFIFOSize = 16;  // default 32
 settings.mDriverTransmitFIFOSize = 80; // default 32
 ACAN2517Filters filters;
 filters.appendFilter (kStandard, 0x780, 0x000, NULL) ;  // alles <0x80 als ID durchlassen

The settings run without errors. Here is the loop first, I may be using it incorrectly, you will see ...

void loop() 
{
 if(can_go){
     frameTX.id = id_m; 
      frameTX.len = 8;
      frameTX.idx = 0 ; 
      frameTX.ext = false ; 
      frameTX.data64 = 0x1122112211221122;
#ifdef ESP_CAN
      ACAN_ESP32::can.tryToSend(frameTX);   
#else
      can.tryToSend(frameTX);           
#endif

      frameTX.id = id_m+1; 
      frameTX.data64 = 0x3344334433443344;
#ifdef ESP_CAN
      ACAN_ESP32::can.tryToSend(frameTX);   
#else
      can.tryToSend(frameTX);   
#endif

      frameTX.id = id_m+2; 
      frameTX.data64 = 0x5566556655665566;
#ifdef ESP_CAN
      ACAN_ESP32::can.tryToSend(frameTX);   
#else
      can.tryToSend(frameTX);   
#endif

      frameTX.id = id_m+3; 
      frameTX.data64 = 0x7788778877887788;
#ifdef ESP_CAN
      ACAN_ESP32::can.tryToSend(frameTX);   
#else
      can.tryToSend(frameTX);       
#endif
      can_go = 0;
 }
 can_us = micros() - can_usold;
 if(can_us > 1000){
   can_go = 1;      
   can_usold = micros();
  }
}

With the MCP, I cannot detect any errors with this loop, all 4 messages arrive clean, there is a bus load of around 45%. Even a tightening with "can_us> 500" and thus approx. 90% bus load runs smoothly. That means I have to check my experiment setup... Now I cannot reproduce this error with this minimal loop. But that also means, that the SPI transmission and the MCP clocking run smoothly in principle. MCP_1D

But now the attempt with bus load by a second participant. A device generates an additional bus load of approx. 33% with 3 messages. In the loop the time is set to "can_us> 1000". First the attempt with ESP_CAN, everything is fine: ESP_2D Now the test with the MCP. After a short running time I can see the error picture I wrote about here before: MCP_2D The same picture arose earlier in my experiment, when the MCP was transmitting alone on the bus and had received stress from the SPI.

tomtom0707 commented 2 years ago

Hello, I took the time to revisit the topic. In my opinion, things don't run smoothly when the CAN gets stressed. Hardware: MCP2518 with 4 MHz oscillator, interrupt and 20MHz system clock. (other hardware, without interrupt, 40 MHz,... in previous attempts resulted in the same error in principle) Very simple attempt, a little hard, but the result produces exactly the errors that otherwise occur sporadically (see the first contributions here). Use the demo program to send a message: grafik

MCP2518_00 JPG Test 0

MCP2518_11 JPG Test 1

MCP2518_22 JPG Test 2

MCP2518_33 JPG Test 3

tomtom0707 commented 2 years ago

Too bad nobody cares... :(

pierremolinaro commented 1 year ago

Hello,

Thank you for you detailed report.

What is your micro-controller ?

I have just ran my TestWithACAN2515 and LoopBackIntensiveTestTeensy3x sketches, they run without any error.

You can try the ACAN2517Settings::OSC_4MHz10xPLL setting, but I don't think it makes any difference.

A detail : micros() returns an uint32_t, so gBlinkDate should have this type.

An other detail: un standard data frame with 8 bytes takes between 111 and 135 bits, 111 µs / 135 µs at 1 Mbit/s. So sending a frame every 100 µs is not possible, therefore you should observe "Send Failure" message. Do you see this message ?

Note the code if ((micros() - gBlinkLedDate) > 100) { gBlinkLedDate = micros() ;

does not ensure a periodic execution (the '>' provides a period greater or equal to 101). It is better to write: if ((micros() - gBlinkLedDate) >= 100) { gBlinkLedDate += 100 ;

You have also to set the initial value of gBlinkLedDate to a given value (for example gBlinkLedDate = 1000000 for starting 1 s after boot), otherwise you will have a burst at start up.

All the frames that are given by your CAN logger are valid frames, or it prints also invalid ones ?

Kind regards,

Pierre

Le 3 avr. 2022 à 19:27, tomtom0707 @.***> a écrit :

Hello, I took the time to revisit the topic. In my opinion, things don't run smoothly when the CAN gets stressed. Hardware: MCP2518 with 4 MHz oscillator, interrupt and 20MHz system clock. (other hardware, without interrupt, 40 MHz,... in previous attempts resulted in the same error in principle) Very simple attempt, a little hard, but the result produces exactly the errors that otherwise occur sporadically (see the first contributions here). Use the demo program to send a message: https://user-images.githubusercontent.com/59939765/161439502-fc3394ca-82e9-4642-a272-8300b97db8d7.png Test 0: ID:0, Data: 0-8, send time 200 micros (60% bus load) Result: everything ok Test 1: ID:0, data: 0-8, transmission time 100 micros (100% bus load) Result: everything ok (Not all messages can be sent here when the bus is full, but the function is correct for all messages that are sent.) Test 2: ID:120, data: 0-8, transmission time 100 micros (100% bus load) Result: error! (if the ID is > 0, ones are shifted here and wrong ID are generated) Test 3: ID:120, data: f0-f8, transmission time 100 micros (100% bus load) Result: error! (in addition, incorrect remote frames are now generated if the data contains larger values) https://user-images.githubusercontent.com/59939765/161439403-3496803e-9f55-4b35-8b2a-1bfb13e2814f.jpg Test 0

https://user-images.githubusercontent.com/59939765/161440185-c7f79ca3-b8d5-4ed2-af58-7a2467b143b0.jpg Test 1

https://user-images.githubusercontent.com/59939765/161440194-f472706e-fbd6-493b-aeff-2d22a294a1b4.jpg Test 2

https://user-images.githubusercontent.com/59939765/161440205-6c98729b-9dee-4cb2-a650-800fcca36dd6.jpg Test 3

— Reply to this email directly, view it on GitHub https://github.com/pierremolinaro/acan2517/issues/17#issuecomment-1086913126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWKZVB3BYBWUS4ONMUSR7LVDHIGZANCNFSM5LQAAAUQ. You are receiving this because you commented.

tomtom0707 commented 1 year ago

Hello Pierre, I had invested a lot of time and effort in troubleshooting, but got no result. I've been using an Arduino driver from Seeed for the can bus shield for six months. It's not that comfortable and required some customization to my hardware, but I never encountered this error with it. So the same hardware and the same application program. Maybe this as a help: ESP32, MCP2518FD (different clock sources, now with Osc 4 MHz), SPI with 10 MHz. With that I would rule out the hardware 100%, since I had tested different variants there as well. Your hints for the timing are also correct, but the point here is to be able to run the loop without delay() without sending wrong messages. Not all messages can then be sent when the bus is full, correct, but no wrong messages must therefore occur. Your ACAN for ESP32-CAN works without errors. I don't know of such an error either, I've been using CAN for 20 years, either it's running or there are bus errors or whatever... but never wrong messages. In my opinion, bit shifting in the transmission between ESP32 and MCP2518 under certain conditions (see above) is not always directly repeatable, but it occurs, only a matter of time. Many applications will not even notice this error, because 99.9% of the time it works. Why? I don't know, maybe my fault. Kind regards Thomas