Closed HuubBuis closed 2 years ago
I have analyzed the code a bit more. The stepper interrupt service (
The signal on PORT_PA19 (Coolant) is used for measuring.
// Step pulse handler
static void STEPPULSE_IRQHandler (void)
{
if(STEP_TIMER->COUNT16.INTFLAG.bit.MC1) {
STEP_TIMER->COUNT16.INTFLAG.bit.MC1 = 1;
set_step_outputs(next_step_outbits); // Begin step pulse.
} else {
STEP_TIMER->COUNT16.INTFLAG.bit.MC0 = 1;
set_step_outputs((axes_signals_t){0}); // End step pulse.
}
}
// Set stepper pulse output pins
inline static void set_step_outputs (axes_signals_t step_outbits)
{
PORT_IOBUS->Group[0].OUTSET.reg=PORT_PA19; // start measuring
step_outbits.value ^= settings.steppers.step_invert.mask;
pinOut_IOBUS(X_STEP_PIN, step_outbits.x);
pinOut_IOBUS(Y_STEP_PIN, step_outbits.y);
pinOut_IOBUS(Z_STEP_PIN, step_outbits.z);
PORT_IOBUS->Group[0].OUTCLR.reg=PORT_PA19; // end measuring
}
Interesting obeservations, thanks for that.
For some drivers I have added the option to choose from three different output modes, this is how it is done in the MSP432 driver.
I wonder what speed can be achieved by using bitbanding (if available), this should be the fastest when the pins are on different ports.
Another test would be to split the set_step_outputs function into two different versions, one to set bits and one to clear bits.
But then I do not understand where/why the clock cycles are wasted, the reduced overhead by combining calls does not square with the raw overhead from your initial test. A peek at the assembler code generated by the compiler may be revealing?
As a reference here is the bitband version assembly from the MSP432 driver, with no optimizations and pins on the same port:
set_step_outputs:
;* --------------------------------------------------------------------------*
.dwcfi cfa_offset, 0
SUB SP, SP, #8 ; [DPU_V7M3_PIPE]
.dwcfi cfa_offset, 8
$C$DW$167 .dwtag DW_TAG_variable
.dwattr $C$DW$167, DW_AT_name("step_outbits")
.dwattr $C$DW$167, DW_AT_TI_symbol_name("step_outbits")
.dwattr $C$DW$167, DW_AT_type(*$C$DW$T$58)
.dwattr $C$DW$167, DW_AT_location[DW_OP_breg13 0]
STRB A1, [SP, #0] ; [DPU_V7M3_PIPE] |245|
.dwpsn file "../driver.c",line 247,column 5,is_stmt,isa 1
LDR A1, $C$CON10 ; [DPU_V7M3_PIPE] |247|
LDRB A2, [SP, #0] ; [DPU_V7M3_PIPE] |247|
LDRB A1, [A1, #0] ; [DPU_V7M3_PIPE] |247|
EORS A1, A1, A2 ; [DPU_V7M3_PIPE] |247|
STRB A1, [SP, #0] ; [DPU_V7M3_PIPE] |247|
.dwpsn file "../driver.c",line 248,column 5,is_stmt,isa 1
LDR A2, $C$CON11 ; [DPU_V7M3_PIPE] |248|
LDRB A1, [SP, #0] ; [DPU_V7M3_PIPE] |248|
AND A1, A1, #1 ; [DPU_V7M3_PIPE] |248|
STRB A1, [A2, #0] ; [DPU_V7M3_PIPE] |248|
.dwpsn file "../driver.c",line 249,column 5,is_stmt,isa 1
LDR A2, $C$CON12 ; [DPU_V7M3_PIPE] |249|
LDRB A1, [SP, #0] ; [DPU_V7M3_PIPE] |249|
UBFX A1, A1, #1, #1 ; [DPU_V7M3_PIPE] |249|
STRB A1, [A2, #0] ; [DPU_V7M3_PIPE] |249|
.dwpsn file "../driver.c",line 250,column 5,is_stmt,isa 1
LDR A2, $C$CON13 ; [DPU_V7M3_PIPE] |250|
LDRB A1, [SP, #0] ; [DPU_V7M3_PIPE] |250|
UBFX A1, A1, #2, #1 ; [DPU_V7M3_PIPE] |250|
STRB A1, [A2, #0] ; [DPU_V7M3_PIPE] |250|
.dwpsn file "../driver.c",line 256,column 1,is_stmt,isa 1
ADD SP, SP, #8 ; [DPU_V7M3_PIPE]
.dwcfi cfa_offset, 0
$C$DW$168 .dwtag DW_TAG_TI_branch
.dwattr $C$DW$168, DW_AT_low_pc(0x00)
.dwattr $C$DW$168, DW_AT_TI_return
BX LR ; [DPU_V7M3_PIPE]
; BRANCH OCCURS ; []
.dwattr $C$DW$165, DW_AT_TI_end_file("../driver.c")
.dwattr $C$DW$165, DW_AT_TI_end_line(0x100)
.dwattr $C$DW$165, DW_AT_TI_end_column(0x01)
.dwendentry
.dwendtag $C$DW$165
.sect ".text"
.clink
.thumbfunc set_dir_outputs
.thumb
The requirement for all step axis on the same PORT is not a big deal. The person who changes the port mappings, can also change the code for setting the step outputs.
Another test would be to split the set_step_outputs function into two different versions, one to set bits and one to clear bits. When on the same port , the mask to clear the bits can be calculated at startup (and after changing settings), that would help.
I have tested using this optimize option, that reduced the processing time to 1.56 us.
inline static void set_step_outputs (axes_signals_t step_outbits) __attribute__((optimize("-O3"))) ; //Optimize this code to get the fastest processing resulting in fastest speed, may reduce the speed of other code
I have looked at the MSP432 assembly output. There are 21 lines of code. I wonder how many clock tics it takes to execute. I will have a look at the SAMD21 assembly output to compare the amount of code that has to be executed. After that, I will modify the code for the requirement of all axis on the same port and no step invert. That will give an indication of what is feasible.
Next week I have an overloaded agenda. After that, there will be time to experiment.
I'll do some timing tests with the MSP432 later, maybe sunday, and post the results here. This could be useful as a baseline reference for comparing output modes and interrupt latency across processors.
Outputting a bit on the 48 MHz MSP432 takes a whopping 350ns, it uses far more clock cycles than is obvious from the assembler listing. The reason behind this is how the instruction pipeline works - and perhaps also the peripheral bus speed. Optimization settings has no effect whatsoever, except for the interrupt latency involved.
For the iMXRT1062 driver I have used structs to store GPIO data for improved speed and to be able to use the Arduino pin numbers for mappings. This approach could possibly be used for the SAMD21 as well? Be aware that the iMXRT1062 is a M7 processor and does not, AFAICT, support bit-banding.
A side note: iMXRT1062 is in a class of its own with only 200 nS subtracted from the pulse timer period to achieve step pulses pretty close to the $0 setting value over the 2 - 10 µs range I have tested.
I have done some test (bit toggling) on the SAMD21 and STM32 using an Arduino sketch for checking the pin maps. On the SAMD21 it results in a 12 MHz pulse. On my SAMD21 mini (10$ ebay) pin D2 and D4 are swapped!
I guess (not checked yet) the code before the actual bit setting takes far more CPU time than the actual bit setting itself. A solution would be to use a timer output for this task. That makes it also possible to stop the stepper.
I am not sure about the impact of the latency. Most of my external stepper drivers are limited to 100 kHz. If the latency is stable, I do not see a problem
I have just commited an update to iMXRT1062 style digital output to the test branch and I cannot discern any significant overhead per bit anymore. I hope you don't mind I did so.
Old code:
#define pinOut(p, e) { if(e) PORT->Group[g_APinDescription[p].ulPort].OUTSET.reg = (1 << g_APinDescription[p].ulPin); else PORT->Group[g_APinDescription[p].ulPort].OUTCLR.reg = (1 << g_APinDescription[p].ulPin); }
New code:
#define DIGITAL_OUT(gpio, on) { if(on) gpio.port->OUTSET.reg = gpio.bit; else gpio.port->OUTCLR.reg = gpio.bit; }
No more array lookups that I suspect triggered pipeline overhead.
What I do not like is that there is some random long pulses that I have not noticed before on the scope but can see when I connect leds to the step outputs. Could it be the USB driver? Perhaps I should try UART serial to see it that helps.
A solution would be to use a timer output for this task.
A timer per step pin then and use compare outputs to set/clear the pins? Interesting idea.
What I do not like is that there is some random long pulses that I have not noticed before on the scope but can see when I connect leds to the step outputs. Could it be the USB driver? Perhaps I should try UART serial to see it that helps. That long Jitter can maybe reduced by giving the stepper irq the highest priority and all other irq's a lower. Synchronizing the peripheral bus speed to the CPU bus speed, would also help a bit. I hope it is not caused by missing a pin clear request!
I have to do some test to get the IO speed when uploaded using Arduino the same as when uploaded using Atmel studio. It probably has something to do with the optimize settings. At the moment, the Arduino environment is the fastest!
A timer per step pin then and use compare outputs to set/clear the pins? Interesting idea.
The requirement for a timer ouput for each stepper, will make it difficult to add more steppers.
The step timer already has the highest priority (0), even with UART serial there is random jitter - some quite long. This also when issuing motion commands via a terminal so no status requests going on.
I am using the Arduino framework a bit more for this driver compared to the other Arduino based ones. Perhaps there is something going on in that? Something disabling IRQs for extended periods of time? The step timer period is set on boot, and the counter reset to 0 on start of a step, and the period is not changed later. So IMO this randomness is a bit weird, not seen anything like it before.
The requirement for a timer ouput for each stepper, will make it difficult to add more steppers.
Yes, but boards like the MKRZERO does not have many pins to start with. I have added an I2C expander on my CNC BoosterPack to get some extra GPIO pins.
I did wrong... I had step delay set to 5 µs - it seems that is the reason for the random jitter. Time to set a pin is comparable to MSP430 - down to around 350 ns. What is not so good is that latency is around 3.5 µs, MSP432 is around 1.6 µs. This is not only the interrupt latency itself but also the time to set two pins, reset and start the timer. It could be that waiting for the SYNCBUSY flag when resetting/starting the timer is quite a bit of this latency time?
I have limited the minimum step time to a few clock cycles, resulting in pulses of around 3.7 µs. Good enough.
Sorry for bothering you with the previous comments. I'll commit an update to the test branch in a few minutes.
waiting for the SYNCBUSY flag
For the SAMD21G according to the datasheet: If I understood it right , you don't need to wait for the sync bus if you write to an IO_BUS_PORT register. Chapter 22.5.3, 22.5.10, 22.6.1, 22.6.2.1 For peripherals, i am not sure, will have to read that chapter again (tomorrow)
Sorry for bothering you with the previous comments. I'll commit an update to the test branch in a few minutes.
This is not a bother, just a learning curve on both sides.
waiting for the SYNCBUSY flag
The timer registers TC COUNT and TC CTRLBSET that are used when starting a step pulse are both flagged as Write-Synchronized. I am not 100% sure this means one should always wait on the SYNCBUSY flag before continuing...
By reorganizing the code a bit I am now down to 2.3 µs latency when delay is not involved. I have forced the set_step_outputs() function to always inline and added a simpler interrupt handler that does not check for the MC1 timer interrupt for that.
Enabling delay brings out the gremlins...
If an interrupt somewhere causes the delay interrupt (MC1) to arrive late the step pulse reset interrupt (MC0) may already be set. To avoid this the step pulse count should be added to the current count after setting the output pins. More overhead...
Since the delay is really only needed on direction changes a better solution could be to add the delay on changes and switch the interrupt handler dynamically. But as a delay is really not needed when AMASS is enabled, this since there is no step pulse generated after a direction change, perhaps a better solution is to get rid of the delay setting altogether?
The timer registers TC COUNT and TC CTRLBSET that are used when starting a step pulse are both flagged as Write-Synchronized. I am not 100% sure this means one should always wait on the SYNCBUSY flag before continuing...
I started rereading the datasheet but have not finished yet. Preliminary conclusions:
Since the delay is really only needed on direction changes
The stepper pulse has to be long enough so that it is detected by the driver
But as a delay is really not needed when AMASS is enabled
Haven't a clue what you mean by this?
will be continued
It is safe to write the stepper registers without checking after the initial setup.
After "inital setup" is that after a cold start of the MCU only?
This MCU is an oddball in many respects - I have to admit I do not like it much. It shares the same documentation problems as some of the others I am working with, hard to understand and not clear at all. But maybe this is because I am not young anymore...
AMASS is "overclocking" the stepper ISR and is enabled by default. There are three levels.
// Adaptive Multi-Axis Step Smoothing (AMASS) is an advanced feature that does what its name implies, // smoothing the stepping of multi-axis motions. This feature smooths motion particularly at low step // frequencies below 10kHz, where the aliasing between axes of multi-axis motions can cause audible // noise and shake your machine. At even lower step frequencies, AMASS adapts and provides even better // step smoothing. See stepper.c for more details on the AMASS system works.
When this is enabled, as it is by default, I have never seen a step pulse been output when a new block is started and hal.stepper_pulse_start() is called to output step pulses. Direction changes will never happen within a block so from that one can deduct that when AMASS is enabled there will always be a delay between a direction change and a new step pulse beeing output. Adding a delay for every step pulse output is then a waste of processor cycles. If AMASS is not enabled a delay is, IMO, only required on a direction change and a pulse is to be output at the moment direction changes - not before every pulse after that.
Below is the code that I have decide to use, hal.stepper_pulse_start -> stepperPulseStartDelayed when step delay is set > 0. On a new block I check if there are step pulses to be output, if so I call stepperPulseAddDelay() which sets up the timer for delayed steps and redirects its IRQs to a handler which outputs the steps after the delay has passed. In this handler I reset operation back to "normal". I should have added a check for a direction change on a new block but see no point in that for now...
I hope this helps.
My latest code wil be committed to the test branch soon.
// Sets stepper direction and pulse pins and starts a step pulse with an initial delay
static void stepperPulseAddDelay (stepper_t *stepper)
{
if(stepper->new_block) {
stepper->new_block = false;
set_dir_outputs(stepper->dir_outbits);
IRQRegister(STEP_TIMER_IRQn, STEPPULSE_Delayed_IRQHandler);
STEP_TIMER->COUNT16.CC[0].reg = 50000;
while(STEP_TIMER->COUNT16.STATUS.bit.SYNCBUSY);
STEP_TIMER->COUNT16.INTENSET.bit.MC1 = 1; // Enable CC1 interrupt
}
if(stepper->step_outbits.value) {
STEP_TIMER->COUNT16.COUNT.reg = 0;
while(STEP_TIMER->COUNT16.STATUS.bit.SYNCBUSY);
next_step_outbits = stepper->step_outbits; // Store out_bits
STEP_TIMER->COUNT16.CTRLBSET.reg = TC_CTRLBCLR_CMD_RETRIGGER;
while(STEP_TIMER->COUNT16.STATUS.bit.SYNCBUSY);
}
}
// Sets stepper direction and pulse pins and starts a step pulse with
// an initial delay if new bock and step is to be output
static void stepperPulseStartDelayed (stepper_t *stepper)
{
if(stepper->new_block) {
if(stepper->step_outbits.value) {
stepperPulseAddDelay(stepper);
return;
}
stepper->new_block = false;
set_dir_outputs(stepper->dir_outbits);
}
if(stepper->step_outbits.value) {
STEP_TIMER->COUNT16.COUNT.reg = 0;
while(STEP_TIMER->COUNT16.STATUS.bit.SYNCBUSY);
set_step_outputs(stepper->step_outbits);
STEP_TIMER->COUNT16.CTRLBSET.reg = TC_CTRLBCLR_CMD_RETRIGGER;
while(STEP_TIMER->COUNT16.STATUS.bit.SYNCBUSY);
}
}
It is safe to write the stepper registers without checking after the initial setup. After "inital setup" is that after a cold start of the MCU only?
Setting the timer count register is safe, enabling the timer interrupt is safe, but the next combination, I am not sure, maybe the CC1 interrupt is fired immediately, depending on the actual value of count. Better to sync to be sure!!!
STEP_TIMER->COUNT16.CC[0].reg = 50000;
while(STEP_TIMER->COUNT16.STATUS.bit.SYNCBUSY);
STEP_TIMER->COUNT16.INTENSET.bit.MC1 = 1; // Enable CC1 interrupt
In the next code cync can be omittted because set_step_outputs doesn't depend on setting the count register
STEP_TIMER->COUNT16.COUNT.reg = 0;
while(STEP_TIMER->COUNT16.STATUS.bit.SYNCBUSY);
set_step_outputs(stepper->step_outbits);
In the next code, sync can be omitted. The return from the call will sure be more than a few clock tics. Before the next access to this same register, it will be in sync
set_step_outputs(stepper->step_outbits);
STEP_TIMER->COUNT16.CTRLBSET.reg = TC_CTRLBCLR_CMD_RETRIGGER;
while(STEP_TIMER->COUNT16.STATUS.bit.SYNCBUSY);
AMASS is "overclocking" the stepper ISR and is enabled by default. There are three levels.
Now I remember, I am getting old to.
I have found the sync error on page 453 serial com description
If an operation that requires synchronization is executed while the corresponding SYNCBUSY bit is one, a peripheral bus error is generated.
I didn't find a flag to check, have read the manual twice now.
I have found the sync error on page 453 serial com description
If an operation that requires synchronization is executed while the corresponding SYNCBUSY bit is one, a peripheral bus
error is generated.
I didn't find a flag to check, have read the manual twice now.
I have read it for the third time, no "error flag found". Maybe it is shown in one of the cpu registers during debugging! Will look into it this again...
I will check the actual time it takes to set the step ouputs later this week to see if I can match the 500ns latency you found.
Code and results: