ttlappalainen / NMEA2000

NMEA2000 library for Arduino
542 stars 228 forks source link

NMEA2000 interfering with reading Serial input. #69

Closed tmcadam closed 7 years ago

tmcadam commented 7 years ago

I have wind sensor (Nano) sending data via RS485 to a Mega running NMEA2000 library and transferring the sensor values over NMEA200 bus onto my Garmin echoMap. The system is working fine when everything is connected however when the Mega is physically disconnected from the NMEA200 bus or the Garmin is switched off, the RS485 connection is failing. It seems that NMEA2000 is somehow interfering with reading the serial on the Mega, when it can't send messages (disconnected). I have tried connecting the RS485 to the different RX/TX ports on the Mega and still find the same behaviour. I have all serial output disabled in NMEA2000. Is SPI using the same buffers as the RS485 using and MCP2515 is sending something back to the Mega if it can't send? I thought they would be separate. Is there a delay in NMEA2000 or mcp_can if it can't send a message? But this shouldn't really matter as my buffer is never getting above (12 bytes).

It's not blocking my project, I don't really need RS485 when the device isn't connected to the NMEA2000 bus, but I would like to understand where the problem is coming from. Thanks

ttlappalainen commented 7 years ago

Do you have latest library for NMEA2000 and mcp_can (from my GitHub)? Do you have interrupts enabled for Mega mcp_can (#define N2k_CAN_INT_PIN xx)?

If bus is not connected, MCP2515 chips blocks sending. It should not block other tasks, but if there is some feature...

tmcadam commented 7 years ago

Thanks for the fast reply, yeah I am pretty sure I have the correct versions and everything up to date.

I have interrupt defined although when I just tried commenting out #define N2k_CAN_INT_PIN 21, the problem disappears! So I tried re-enabling interrupt and swapping around around interrupt pins, swapping the mode select pin for rs485 and even taking out the interrupt lead but that does help. Just having it enabled in the code seems to cause it.

tmcadam commented 7 years ago

Correction: even taking out the interrupt lead but that does not help.

ttlappalainen commented 7 years ago

Is you MCP2551 interrupt pin connected to Mega pin 21? That depends how you have connected it.

But if your system works with bus connected and with #define N2k_CAN_INT_PIN 21, it should have connected to right pin. Otherwise it should not communicate at all, if you have interrupt enabled and wrong pin definition. You can test that e.g. with #define N2k_CAN_INT_PIN 22

And if you disbale interrupts by commenting #define N2k_CAN_INT_PIN 21 and then it works even not connected to the bus, I have to check mcp_can code interrupt handling.

tmcadam commented 7 years ago

I have made some progress and think I have identified the problem, but not the solution I'm afraid. I found quite quite a few issues on the Arduino boards about interrupt and serial issues. It seems serial uses interrupts to trigger when a byte has been received or is ready to send. So I had a look and found that in _NMEA2000mcp.cpp, function tNMEA2000_mcp::CANSendFrame(.....), the interrupts are being globally disabled. I think if the message is sent successfully they are then re-enabled very quickly, but if the sending fails(i.e. bus disconnected), the interrupts are disabled for long enough to miss a few bytes of serial input. If I comment line 85, I don't get the problem, so that is definitely the cause of the issue I am having.

Also some of the problems I read about with interrupts and serial were caused by having too much going on in the interrupt handler function, and recommended simply setting a flag in the interrupt handler and then having the flag picked up and acted on in the main loop. I see there is quite a bit going on in void CanInterrupt(), which could potentially cause problems with things like the Actisense Sender which depends on serial (it can impact both serial send and recieve).

Apologies to bring you just problems without solutions.

ttlappalainen commented 7 years ago

Thanks, any information is valuable. I try to reread the code. I quicly did not understood, why interrupts were disabled during sending, so I have to read history too. CanInterrupt handles only reading buffer filled by receive interrupt. I tried to modify mcp_can to also send by interrupt, but that failed. Since I have been more concentrated with Teensy, mcp_can has got less time.

ttlappalainen commented 7 years ago

Hi, Now I know. Since if interrupt is used for receiving, it must be disabled during transmit due to mcp_can library - it is not written well. I'll retry to modify mcp_can to handle also write interrupts. Then it will not stuck anymore to delay as it does now on sendMsg, if it can not send messages.

ttlappalainen commented 7 years ago

There is now new version of NMEA2000_mcp and mcp_can under my GitHub. Download them and try. According to my measurements it takes about 100 us with Arduino Mega to do NMEA2000.SendMsg, if cable is not connected. Normal single frame message will be sent in about 220 us. This is also improved, since it uses interrupt also for transmitting.

So now it should not stuck anymore.

tmcadam commented 7 years ago

Hi. Thanks for the update. I have tested the changes this morning. It has solved the problem with holding up the serial input, however it is not sending the NMEA2000 messages to the bus consistently. The values on the Garmin appear for a few seconds and then stop. I looked at the network with Actisense listener on another device and it is showing messages, but they are intermittent and always in pairs, I guess one of these factors is causing the Garmin to reject them. I reduced my program until it was identical to the Temperature Monitor example, but still have the same problem. It sends single message fine with interrupt disabled, or if I reset Git to before yesterdays changes.

ttlappalainen commented 7 years ago

Can you provide me online session with TeamViewer (use version 9 from http://kave.fi/tuki/TeamViewerQS_Kave_fi.exe)? Email me ID and pw.

I did not completely understood, what you mean with "intermittent and always in pairs". On my test I got steady 1 sec update with MessageSender example.

tmcadam commented 7 years ago

Sorry, my last message was unclear, and I was running it my own program, but I have repeated it now with the Temperature monitor example and having the same issue. I have attached 2 files, they are both running the TemperatureMonitor.ino example, just with these lines added. The only code difference is the N2k_CAN_INT_PIN 20 line being commented.

#define N2k_SPI_CS_PIN 53
#define USE_MCP_CAN_CLOCK_SET 8
#define N2k_CAN_INT_PIN 20

With interrupt enabled the output to the Garmin is intermittent, works sometimes, but mainly displays values for a few seconds and then stops displaying (and doesn't show up in available devices). That seems like a problem with the ISO Request & Product Information Response, but they look fine in NMEA Reader.

With interrupt disabled the output to the Garmin is always working fine.

The output to NMEA Reader (from a Due running Actisense Listener on same bus) with interrupt on and off looks the same, except 130312 is being sent twice. It isn't being sent once every 1.25 seconds as it looks, but twice (close together) every 2.5 seconds, I think NMEA Reader is presuming they have a regular spacing and averaging them out.

temperaturemonitor_interrupt temperaturemonitor_no_interrupt

tmcadam commented 7 years ago

1st screen grab - With interrupt 2nd screen grab - Without interrupt

ttlappalainen commented 7 years ago

Please change on NMEA2000_mcp.cpp line 90: if ( result!=CAN_OK && !pTxBuf->AddFrame(id,len,buf) ) result=CAN_FAILTX;

tmcadam commented 7 years ago

I changed the || to &&.

130310 seems to have strange timing. Also no product information was sent in response to the ISO request.

temperaturemonitor code change with interrupt

ttlappalainen commented 7 years ago

Sorry for problems. Works right on my system and Garmin GMI 20 can see it and TemperatureMonitor responds to ISO Request. I have to ask to check again that do you have terminal resistor on bus. Just few weeks ago I had that problem that my resistor was dropped.

You could also modify TemperatureMonitor:

 Serial.begin(115200);
  NMEA2000.SetForwardStream(&Serial);
  // If you want to use simple ascii monitor like Arduino Serial Monitor, uncomment next line
  NMEA2000.SetForwardType(tNMEA2000::fwdt_Text); // Show in clear text. Leave uncommented for default Actisense format.

  // If you also want to see all traffic on the bus use N2km_ListenAndNode instead of N2km_NodeOnly below
  NMEA2000.SetMode(tNMEA2000::N2km_ListenAndNode,22);
  //NMEA2000.SetDebugMode(tNMEA2000::dm_Actisense); // Uncomment this, so you can test code without CAN bus chips on Arduino Mega
  //NMEA2000.EnableForward(false); // Disable all msg forwarding to USB (=Serial)

Then before sending code to device, open SerialMonitor (on Arduino IDE). Then send code, so it opens SerialMonitor immediately code has been sent and records all message. Let it run 10-15 sec and copy all to e.g. notepad. Then you can search "Dest:22 Len:3 Data:14,F0,1", which is ISO request (for PGN 126996) pointed to TerminalMonitor. After that should be the respond.

Other option is to enable NMEA2000.SetForwardType(tNMEA2000::fwdt_Text); on you other device and listen with SerialMonitor that. Unfortunately there is no time stamps, but for testing you could add: port->print(millis()); port->print(F(" : ")); on N2kMsg.cpp before line 653 (port->print(F("Pri:"));...) This may be better, since if you plot data sent from TemperatureMonitor, it does not show, when it was actually sent due to buffering. On the other hand on Temperature monitor it will show sending errors on test mode.

I hope you could do that test, since I can not produce the problem on my system.

tmcadam commented 7 years ago

This is the log of startup with interrupt enabled and the code change in NMEA2000_mcp suggested previously.

startup_with_code_change.txt

tmcadam commented 7 years ago

This is the log of startup with interrupt enabled and the code change in NMEA2000_mcp reverted, so current master.

startup_no_code_change.txt

tmcadam commented 7 years ago

This is the log of startup with interrupt disabled and NMEA2000_mcp as master.

startup_no_interrupt.txt

tmcadam commented 7 years ago

.....and don't worry about the problems, thank you for your help and the work you have done on this library. I guess this is the challenge with an embedded project, that there are many variables in the hardware as well as software.

ttlappalainen commented 7 years ago

Some more questions:

There is an other bug in the code on NMEA2000_mcp.cpp line 105 should be: if (MaxCANReceiveFrames<2 ) MaxCANReceiveFrames=2;

But still I do not understand why TemperatureMonitor example works with my board Mega+own MCP2515. I noticed some strange effect, which I have to investigate: 130310 will be shown 3 times. Also other test device seem to have last message doubled.

tmcadam commented 7 years ago

I just checked the resistance on the bus and it looks fine. I am using a Mega and little CANBUS shields off Ebay, MCP2515 Controller, TJA1050 transceiver and 8MHz speed. The logs came from Listener on a Due with an MCP2562 chip which I made up following your circuit diagram. I have just tested with a similar board with 16Mhz chip and still seeing the problem.

I experimented a while back using the library with Atmega 328P boards, but strangely had the same issue with interrupt, although I thought that it was due to the ram being so close to the limit, and so it couldn't send the product information when the Garmin was requesting. When I switched to the Mega the problem was solved.

As my device is just a simple sender I could probably get away without using interrupt at all and use mask and filter to just receive the PGN that are important for the network (address claim and iso request). Do you see a problem with this approach?

I will add the other code change and test.

ttlappalainen commented 7 years ago

It is not good to just leave obvious bug to the code, if it is possible to find.

C/C++ allows to calculate PI with single line code, but it is not very readable. Also the line 90 on NMEA2000_mcp.cpp was not clear enough even it is simple.

I finally could generate the error you had and so found error in logic on line 90. It is now updated on GitHub. And why I did not first had same error - since I used test version of TemperatureMonitor with interrupts disabled.

tmcadam commented 7 years ago

Wow, that's great, it's working really nice now. The Garmin is displaying reliably and I am not seeing the original issue of the RS485 being blocked when the bus is disconnected.

I am still getting some missed RS485 packets (1 in 100, which isn't effecting me too badly as my application can handle them), but it is definitely being caused by the NMEA2000 interrupt interfering slightly with the serial communication interrupts. I have a fix already and am seeing no serial interference at all now (0 in 15,000+ packets).

I can submit as a PR, but I know it will break if compiling with the other (non-MCP) hardware types, so maybe that isn't best? I still don't fully understand how all the pieces come together in the library.

Thanks for your help with resolving this.

ttlappalainen commented 7 years ago

Thanks for you - without your files, I would have not noticed the problem.

You can also just post changed line so I can rewiew them. We can not break compiling for any hw.

tmcadam commented 7 years ago

Basically it is just moving the message handling out of the interrupt handler and processing it in the main program loop. This way the interrupt is not being held up during the processing. The only thing happening in the interrupt is setting a flag NewInterrupt, which is obviously very quick. Then something on the main loop needs to check for this flag and process it ProcessInterrupt. The only place I could see to put it on the main loop is parseMessages, which shows it works, but is almost certainly not the best place for it. Maybe doing it this way will have some other negative impact?

NMEA2000_map.cpp Line 163 onward, change to:

bool NewInterrupt = false;
//*****************************************************************************
// I am still note sure am I handling volatile right here since mcp_can has not
// been defined volatile. see. http://blog.regehr.org/archives/28
// In my tests I have used only to receive data or transmit data but not both.
void tNMEA2000_mcp::ProcessInterrupt() {
    if (NewInterrupt) {
        NewInterrupt = false;
        INT8U RxTxStatus;
        // Iterate over all pending messages.
        // If either the bus is saturated or the MCU is busy, both RX buffers may be in use and
        // reading a single message does not clear the IRQ conditon.
        // Also we need to check and clear all transmit flags to clear IRQ condition.
        // Note that this handler expects that Wakeup and Error interrupts has not been enabled.
        do {
        uint32_t id;
        unsigned char len;
        unsigned char buf[8];

        RxTxStatus=N2kCAN.readRxTxStatus();  // One single read on every loop
        INT8U tempRxTxStatus=RxTxStatus;      // Use local status inside loop

        while ( N2kCAN.checkClearRxStatus(&tempRxTxStatus)!=0 ) {           // check if data is coming
          N2kCAN.readMsgBuf(&len,buf);
          id=N2kCAN.getCanId();
        //      asm volatile ("" : : : "memory");
          pRxBuffer->AddFrame(id,len,buf);
        }

        if ( !pTxBuffer->IsEmpty() ) { // Do we have something to send on single frame buffer
          while ( N2kCAN.checkClearTxStatus(&tempRxTxStatus)!=0 && pTxBuffer->GetFrame(id,len,buf) ) {
            N2kCAN.trySendMsgBuf(id, 1, len, buf);
          }
        } else { // Nothing to send, so clear flags
          N2kCAN.clearBufferTransmitIfFlags();
        }

        if ( !pTxBufferFastPacket->IsEmpty() ) { // Do we have something to send on fast packet frame buffer
          // CanIntChk=tempRxTxStatus;
          if ( N2kCAN.checkClearTxStatus(&tempRxTxStatus,N2kCAN.getLastTxBuffer())!=0 ) {
            pTxBufferFastPacket->GetFrame(id,len,buf);
            N2kCAN.trySendMsgBuf(id, 1, len, buf,N2kCAN.getLastTxBuffer());
          }
        } else { // Nothing to send, so clear flag for this buffer
          N2kCAN.clearBufferTransmitIfFlags(N2kCAN.getLastTxBuffer());
        }

        } while ( RxTxStatus!=0 );
  }
}

void tNMEA2000_mcp::InterruptHandler() {
    NewInterrupt = true;
}

//*****************************************************************************
void Can1Interrupt() {
  pNMEA2000_mcp1->InterruptHandler();
}

NMEA2000_mcp.h line 70 add: void ProcessInterrupt();

NMEA2000.h line 366 add virtual void ProcessInterrupt();

NMEA2000.cpp lline 1580 add: ProcessInterrupt();

ttlappalainen commented 7 years ago

Unfortunately that is not acceptable. And actually there is no sense in that way. If you change it in that way, then it is same as you just disable using interrupt. In you fix when interrupt arrives, it will not be processed before you call ParseMessages and if you do not call it, not eny received messages will be buffered. On the other hand if you call ParseMessages fast enough, it will anyway read messages. The point of interrupt routine is that it can "interrupt" all normal code. In better systems higher level int can interrupt lower level int.

ttlappalainen commented 7 years ago

I'll try to improve mcp_can again. Now it is spending >300 us on reading frame, since it rechecks over SPI the right buffer. Or maybe I have to write ISR routine to the mcp_can, so that I do not need to publish all lower level handling on it. I think reading can be squeezed at least to half time or maybe more.

ttlappalainen commented 7 years ago

I managed to squeeze 50% of receive and sending times spent on interrupt. I'll try to clean and update code tomorrow morning.

tmcadam commented 7 years ago

Okay, that sounds great. Once you have updated let me know and I will test it out.

ttlappalainen commented 7 years ago

You can try it now. I did not yet applied all improvements to mcp_can functions that I do not use on NMEA2000_mcp. I'll try to go through differences with Seeed version later.

Some results (old/new in micros): ISR send buffer empty 144/72 ISR buffer arrived msg 332/120 ISR send buffered message 320/172

tmcadam commented 7 years ago

Do you mean it is ready to try now? I'm getting compile errors, with the latest commit.

ttlappalainen commented 7 years ago

Try again. mcp_can was not synced.

tmcadam commented 7 years ago

It's a good result, over 10,000 packets on RS485 and no bad packets so far (including disconnecting and reconnecting the NMEA bus several times).

Thanks for all your help with this issue, I think it is fully resolved for me now.

ttlappalainen commented 7 years ago

Good to hear!

What speed you are using on RS485?

tmcadam commented 7 years ago

Baud 57600, packet interval is 100 milliseconds.

ttlappalainen commented 7 years ago

Receiving one byte (10 bits) with 57.6k takes 173 us.

Sending one full frame (128 bits) took 512 microseconds. So with old code it send a frames and spent 320 us on IRS and so leaving some times only <200 us between interrupts. So you could have totally one byte within CAN ISR and so it could completely loose it (only 1 byte fifo), since it spent 320 us within ISR. Now ISR time is 172 us, so probability is much smaller or 0, since 172<173.

This shows how important it is squeeze every us away on ISR code.

ttlappalainen commented 7 years ago

Hi,

Just for extra testing please load latest NMEA2000_mcp and CAN_BUS_shield from my repo. I applied changes to latest Seeed version and manually merged them. Works fine on my test bed. I also made PR for Seeed - hopefully they will apply it.

tmcadam commented 7 years ago

Hi, sorry for the delay. I got a chance to test the updates to the 2 mcp libraries and it seems fine, still very stable and not interfering with serial..

ttlappalainen commented 7 years ago

Thanks!

olesenrl commented 7 years ago

Folks,

Could you please advise what a test bed would consist of for this. Also has anyone considered using WiFi with this library.

Regards,

Bob

On Sep 18, 2017, at 1:26 AM, Timo Lappalainen notifications@github.com wrote:

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

ttlappalainen commented 7 years ago

I did not completely understood your question. This issue is closed and point is in topic.

If you are starting with NMEA2000 devices, I prefere to use Teensy boards, since they have more memory and are more powerfull. E.g. KBox (https://hackaday.io/project/11055-kbox) has WiFi build in the board, so it can be used. But again I do not prefere Arduino Mega or any smaller board with WiFi.