scottalford75 / Remora-RP2040-W5500

Remora firmware for RP2040 with W5500 Ethernet
5 stars 8 forks source link

Branch picobob pr2 #4

Closed andrewmarles closed 1 year ago

andrewmarles commented 1 year ago

This is the best performance I have been able to get to with the RP2040 PRU. I really have a much better understanding of Remora now and how it interacts with LinuxCNC. The main point of all of these changes has been to reduce the amount of jitter in the base thread running on the MCU to eliminate timing jitter on the step pulses. There were 3 major sources of timing jitter: 1) Jitter caused by data-copy delays 2) Jitter caused by interrupt contention between the base thread and servo thread. 3) Jitter caused by inconsistent execution times in the base thread/step generator.

To address the data copy delays (having to pause the base thread while data is copied in and out with the host) I made a simple double-buffer and now the base thread only needs to be interrupted long enough to switch the pointers over. This did require some small changes to the stepgen code. I am not blocking the servo thread at the moment as I don't think it needs it, but this is maybe something to look at if you start doing more than just basic I/O/Blink in the servo thread.

To address number 2 I made the following changes: 1) Swapped the timers over to the main microsecond timer and used two different timer alarms, one for the base thread and one for the servo thread. This allows the two threads to have different IRQ priority levels as well as nesting and makes step 2 below a bit easier to accomplish. 2) I moved the execution of the servo thread outside of an interrupt context. This allows the base thread to interrupt the servo thread and pulse the step pins. Along with this, a servo thread of 2 KHz ensures that there is minimal latency on any updates to/from the LinuxCNC host for the slower servo thread I/O.

The above changes are fine and good but there are some gotchas now (issue 3) because the base thread is running in an interrupt context and doing floating-point math. Since the RP2040 doesn't have a FPU, this math requires calls to the software floating point libraries. The issue here is that those libraries are stored in the SPI flash and contention on that bus (networking code on the other core mainly) plus it is not really fast to begin with means that the base thread can still be interrupted by the rest of the system even if you use critical sections. And because of the library access it's non-trivial to try to get specific functions (stepgen) loaded into RAM. So I just load the entirety of Remora into RAM "set(PICO_COPY_TO_RAM 1)" and it all fits and this eliminates the jitter from the FP instructions.

Might need to keep an eye on this as the networking packet buffers are dynamic, but there is a decent amount of memory left as Remora is fairly compact and the config file is still stored in flash:

Running from flash: [build] Memory region Used Size Region Size %age Used [build] FLASH: 168008 B 2 MB 8.01% [build] RAM: 45788 B 256 KB 17.47% [build] SCRATCH_X: 2 KB 4 KB 50.00% [build] SCRATCH_Y: 0 GB 4 KB 0.00%

Running from RAM: [build] Memory region Used Size Region Size %age Used [build] FLASH: 167452 B 2 MB 7.98% [build] RAM: 202908 B 256 KB 77.40% [build] SCRATCH_X: 2 KB 4 KB 50.00% [build] SCRATCH_Y: 0 GB 4 KB 0.00%

So the Remora application is taking up about 157K.

A fixed point implementation of the stepgen would be pretty helpful here as it would avoid software FP on the M0 processors, but that's outside my scope for now. I am pretty happy with the performance of the little RP2040 PRU with the above optimizations. I think there is still work to be done on the component side to tune up the gains, but I am getting pretty good results with a 1ms servo thread on a RPI4.