mrrwa / NmraDcc

NMRA Digital Command Control (DCC) Library
GNU Lesser General Public License v2.1
137 stars 53 forks source link

Performance of NmraDcc on Arduino Nano Every (and other new ATMega processors) #57

Closed aikopras closed 3 years ago

aikopras commented 3 years ago

Hello

I'm experimenting with some new ATMega processors (4808, 4809, 128DA), originally for my own DCC decoding libraries (still following the OpenDecoder approach of using a timer) but 9as comparison and check) also with the NmraDcc library.

The good news is that the NmraDcc library compiles and runs on the Nano Every. However, like my own decoding library, performance on these new processors turns out to be considerably slower than that on the 328. There are several reasons for that, the most important one being the overhead created by attachInterrupt(). On the UNO is takes roughly 3,5 microseconds after the pin change until the "user code" within the ISR starts. On the Nano Every this time is twice as long, around 7 microseconds 9depending if the standard Arduino boards of the MegaCoreX is used). See also my earlier post in the Arduino forum: https://forum.arduino.cc/t/difference-in-time-between-interrupt-start-on-mega-2560-and-nano-every-4809/897591/15 Unfortunately, not only attachInterrupt() is considerably slower, but also micros() (you can find several discussions of that in the various fora).

My question is therefore whether others observe the same performance problems of NmraDcc on these new ATMega processors. I understand that the NmraDcc library will run on lightly loaded Nano Every boards, but I'm running into problems if I include (my own) RS_Bus feedback library. Also the RS-Bus raises an Interrupt roughly every 100 microseconds, and two of these libraries together don't work reliably.

For my own DCC decoder library I decided to make a drastic change, and add some of the new features that these new 480X and 128DA processors offer. Basically I connect the DCC input signal to the Event system, which triggers a timer as output. This saves me the (overhead of the) DCC input ISR, and the timer gives me a very stable reading of the DCC signal.

I would be interested in what others think of this.

Aiko

kiwi64ajs commented 3 years ago

Yeah, this is the trade-off of providing a generic library that is widely supported, but not optimised for any specific architecture. However for many people this seems to be ok and with newer CPUs having higher clock speeds it kinda doesn’t matter.

I faced the same situation with the LocoNet library as well as it started out life as an AVR only library and over time other architectures have been added but the code ends up looking pretty messy.

So I then took the approach of refactoring it into LocoNet2 library that moves all the core LocoNet logic into base classes, with the expectation of users then creating subclasses that implement the architecture specific Init(), Read() and Write() type methods, which I think is a better approach but we’re not there yet as my time is pretty limited…

We could do a similar thing for this library as well, if people are keen to do the refactoring.

We could start by refactoring it into a base class and the derived class being the current implementation.

Another potential solution that actually might be less effort is to use the new RPI Pico or one of the other RP2040 based boards and use one core for NmraDcc and the other for core for your other interfaces. i.e use a bigger hammer… ;)

HTH

Alex

On 24/08/2021, at 4:29 AM, Aiko Pras @.***> wrote:

Hello

I'm experimenting with some new ATMega processors (4808, 4809, 128DA), originally for my own DCC decoding libraries (still following the OpenDecoder approach of using a timer) but 9as comparison and check) also with the NmraDcc library.

The good news is that the NmraDcc library compiles and runs on the Nano Every. However, like my own decoding library, performance on these new processors turns out to be considerably slower than that on the 328. There are several reasons for that, the most important one being the overhead created by attachInterrupt(). On the UNO is takes roughly 3,5 microseconds after the pin change until the "user code" within the ISR starts. On the Nano Every this time is twice as long, around 7 microseconds 9depending if the standard Arduino boards of the MegaCoreX is used). See also my earlier post in the Arduino forum: https://forum.arduino.cc/t/difference-in-time-between-interrupt-start-on-mega-2560-and-nano-every-4809/897591/15 <x-msg://2/url> Unfortunately, not only attachInterrupt() is considerably slower, but also micros() (you can find several discussions of that in the various fora).

My question is therefore whether others observe the same performance problems of NmraDcc on these new ATMega processors. I understand that the NmraDcc library will run on lightly loaded Nano Every boards, but I'm running into problems if I include (my own) RS_Bus feedback library. Also the RS-Bus raises an Interrupt roughly every 100 microseconds, and two of these libraries together don't work reliably.

For my own DCC decoder library I decided to make a drastic change, and add some of the new features that these new 480X and 128DA processors offer. Basically I connect the DCC input signal to the Event system, which triggers a timer as output. This saves me the (overhead of the) DCC input ISR, and the timer gives me a very stable reading of the DCC signal.

I would be interested in what others think of this.

Aiko

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mrrwa/NmraDcc/issues/57, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5Y53NRM3DPF4MSOKMWS5DT6JZOTANCNFSM5CU6DHFQ.

aikopras commented 3 years ago

Hi Alex

Just for reference I attach a screenshot, showing for the Arduino UNO and the Nano Every the Test Point 3. What can be seen is that the UNO takes roughly 15 microseconds to capture and analyse the DCC input signal, whereas the Nano Every takes around 22 microseconds. The overhead of attachInterrupt() is 4,1 (UNO) versus 7,4 (Every) microseconds.But also the "micros() part" is relatively expensive: 3,5 (UNO) versus 7,2 (Every) microseconds. The digitalRead() takes roughly 2 (UNO) versus 2,4 (Every) microseconds. The remaining code is, as can be expected, for UNO and Every roughly the same (4,9 microseconds).

DCCNMRA-Timing-UNO-NanoEvery-TP3

Yeah, this is the trade-off of providing a generic library that is widely supported, but not optimised for any specific architecture.

I completely agree, and also that for most users this will be OK. For users who want to choose between existing board, such as Arduino Uno, Mega, Nano (Every), ESP, RPI, this is the right approach. If ATMega processor boards become too slow, they just buy another board with a faster processor. However, users like me, who have fun in designing and manufacturing their own boards (I know, this is a small minority), this approach may not be ideal. Building your own boards based on ATMega processors is quite easy and cheap, building your own RPI (for example) is (at least for me) far more challenging.

On a nano Every (or better: megaCoreX or DxCore) it might be possible to have the entire DCC signal processing done within 5 microseconds, including ISR overhead. The DCC input signal goes to the event system, which triggers a TCB timer, which reads (in a clever way) the input signal.
Decoding in 5 microseconds can be great if the board also has some other time critical functions, like a RS-bus (or any other) feedback bus.

So I then took the approach of refactoring it

Yes, that would make adaptation for different boards much easier. At this moment the entire NmraDcc library is in one big cpp file, without any structure (to be honest, I hate that). The first step would be, just like the original OpenDCC software, to separate it into receiver (board specific) and decoder (same for all boards) files. In additional step (which I took for my decoder library) is to also move the hardware specific parts (the first part of the receiver ISR) into separate #include files.

This would indeed be a lot of work, and maybe not worth the effort. For "engineers" (like me) it will be great, but normal users probably just buy another (non ATMega?) board.

Bye

Aiko

kiwi64ajs commented 3 years ago

Thanks for you timing information you've posted - always good to see stuff like this.

At this point I'm just going to close this issue as - won't fix (for now at least)