Testing G33 on STM32F411 nucleo fails

HuubBuis commented 3 years ago

As a preparation to ad spindle sync to the ESP32 and SAMD21, I tested grblHAL on a STM32F411 nucleo board. To get it working, I basically only had to change the nopullup on the encoder pin to pullup, same like index pin.

When running G91 G33 Z5 K1 at 80 RPM, independent of PID settings, grblHAL stops responding to g33 commands after some commands (2 to 3) and the reported RPM shows 0. The faster the spindle in running, the more commands I can sent before the problem arises.

I had a look at the code and found that the RPM timer and counts are not processed ATOMIC. I changed it but it had no effect. Nevertheless, this should be changed (also in the MSP432 version).

I have added a settings file containing the settings I used for testing.

It seems that the local commit I have made created an unintended pull request on your repository. You should ignore this pull request. Seems I can't get used to git.

My grblHAL STM32F411 fork

HuubBuis commented 3 years ago

I moved the spindle sync calculation to the main thread (quick and dirty) and tested it on the bench. Runs stable now. I suspect, the problems are caused by FPU access in interrupt time, have checked the STM documentation but couldn't find a confirmation for this. I have uploaded the code.

terjeio commented 3 years ago

Thanks, I'll get back to this soon.

I have been busy lately with updating the settings subsystem and Trinamic driver plugins - and your previous notification dropped off my radar due to this and a large amount of messages lately. Sorry about that.

FPU access in interrupt time, have checked the STM documentation but couldn't find a confirmation for this.

Hmm, I wonder if lazy stacking of FPU registers is enabled by default? I have this code in the MSP432 driver:

 /* enable lazy stacking of FPU registers */

    FPU->FPCCR = (FPU->FPCCR & ~FPU_FPCCR_LSPEN_Msk) | FPU_FPCCR_ASPEN_Msk;

While skimming the linked document I see now the above code disables lazy stacking and that it is enabled by default...

HuubBuis commented 3 years ago

While skimming the linked document I see now the above code disables lazy stacking and that it is enabled by default...

If I understood is right, if lazy stacking is disabled (default), FPU stack is always saved in interrupts.

If lazy stacking is disabled, the processor always pushes the floating-point registers to the stack 
at exception entry and unstacks them at interrupt return.

So, disabling is always safe

It also states

If the interrupt handler uses the FPU at some stage, the processor is stalled when the first 
floating-point instruction take place,

So if lazy stacking is enabled, it is also safe to use FPU in interrupts.

The most imported part:

In application code where only one exception handler 
uses the FPU. If multiple interrupt handlers use the FPU, 
they must not be permitted tobe nested. This can be done 
by setting them to the same priority level

So if more than one, same priority interrupt routine uses FPU, lazy stacking must be disabled.

I have checked the STMF411 code and found that only the stepperPulseStartSynchronized interrupt code uses the FPU. So FPU use in interrupts shouldn't be an issue regardless of the lazy stacking setting (in theory).
I have run a repeated threading task under real conditions using my rotary table as Z-axes. I have let it run for one hour at different spindle RPM. GrblHAL was always responding and I didn't need to reset the processor. So moving the the calculations out of the interrupt routing has solved this issue (in practice). Weird

Maybe the use of pointers to the actual stepping routine is causing the problems because at compile time, ISR_CODE void stepper_driver_interrupt_handler (void) doesn't know if the code that handles the stepping uses FPU and stepperPulseStartSynchronized doesn't have an ISR_CODE attribute.

I noticed another problem: I am losing steps and the speed changes just after startup is sometimes very rough. This could be caused by my "quick and dirty" not thread safe solution. Tomorrow I will make it thread safe and run a new test.

terjeio commented 3 years ago

he stepping uses FPU and stepperPulseStartSynchronized doesn't have an ISR_CODE attribute.

The ISR_CODE attribute is currently only set for the ESP32 driver. The ESP32 executes code from RAM and needs to page this in from flash as needed since RAM size is limited. This adds overhead to execution and to avoid that for interrupt handlers ISR_CODE is used to flag code that is to be kept in RAM at all times.

ISR_CODE is mapped to IRAM_ATTR for ESP32 and nothing for all the other drivers:

https://github.com/terjeio/grblHAL/blob/2d871b2d0a71193df8015f98da1bc565d12341d0/grbl/grbl.h#L42-L49

HuubBuis commented 3 years ago

I have made the feed rate synchronization thread safe. It now also runs most of the time pretty good and stable at 30 RPM, but not allways. Running at a higher RPM (80) is stable (30 minutes run).

I expect that sudden changes in speed (lead in at start threading) could lead to exceeding the speed & acceleration limits of the hardware and so in losing steps. Setting the stepper speed using the planner could prevent this.

I have uploaded the code.

terjeio commented 3 years ago

I have taken a brief look at your code, I need to wire up my F4xx and spindle simulator to go deeper into it.

Note that protocol_enqueue_rt_command() can be used to post a single callback to be executed by the foreground process, this could be used instead of adding to the HAL/core?

https://github.com/terjeio/grblHAL/blob/2d871b2d0a71193df8015f98da1bc565d12341d0/grbl/protocol.c#L857-L871

HuubBuis commented 3 years ago

Note that protocol_enqueue_rt_command() can be used to post a single callback to be executed by the foreground process, this could be used instead of adding to the HAL/core?

Didn't know this was available. Using this in code called by stepper_driver_interrupt_handler() could lead to "no synchronization" if the buffer is full, but that is a minor.

I checked the PID log. It seems that my last upload doesn't synchronize the speed enough though it does response to speed changes. Tomorrow I will have a look of what is wrong.

I have added the ISR_CODE attribute to all functions called by stepperPulseStartSynchronized and run it in interrupt time. I expected threading to be stable, but it wasn't. I am puzzled why. Maybe the assembler listing reveals something. I will check it tomorrow.

terjeio commented 3 years ago

I have added the ISR_CODE attribute to all functions called by stepperPulseStartSynchronized

This is a waste of energy as it is an empty define for all processors but ESP32.

HuubBuis commented 3 years ago

This is a waste of energy as it is an empty define for all processors but ESP32

I also added the attribute to the implementation, not the definition. So it wouldn't work at all.

If I understood is right, if lazy stacking is disabled (default), FPU stack is always saved in interrupts.
The lazy stacking feature is programmable. By default this is turned ON
I was wrong, Lazy stacking is enable by default. It explains why there is no saving of FPU registers in the assembler listing.

Found that a lot of vars that are accessed in interrupt time have no volatile attribute. Changed that but it made no difference.
Restored your version, disabled the FPU and set floating point calculations to software. No difference, unstable at low rates (< 80 RPM) and erratic stepper feed rates. Had expected a better result.

HuubBuis commented 3 years ago

I have found that the pulse and RPM timers are 16 bit. Running at slow feed rates and using a low resolution encoder, could lead to counter overflow.

#define PULSE_TIMER_N               4 
#define RPM_COUNTER_N            3

terjeio commented 3 years ago

The pulse timer is not involved in spindle sync calculations, the pulse counter is 32 bit and updated every four RPM_COUNTER tics by adding the current count - last count using 16 bit unsigned arithmetic. An this is where it fails for this processor.

It seems to me that the 16 bit timers does not have "clean" 16 bit registers... Adding casts to uint16_t for registers and variables involved in calculations helps? At least the live RPM display in my sender does not seize up randomly anymore. Here are the changes I've made:

static spindle_data_t spindleGetData (spindle_data_request_t request)
{
    bool stopped;
    uint32_t pulse_length = spindle_encoder.timer.pulse_length / spindle_encoder.counter.tics_per_irq;
    uint32_t rpm_timer_delta = (uint16_t)((uint16_t)RPM_TIMER->CNT - (uint16_t)spindle_encoder.timer.last_pulse);

    // If no (4) spindle pulses during last 250 ms assume RPM is 0
    if((stopped = ((pulse_length == 0) || (rpm_timer_delta > spindle_encoder.maximum_tt)))) {
        spindle_data.rpm = 0.0f;
        rpm_timer_delta = (uint16_t)(((uint16_t)RPM_COUNTER->CNT - (uint16_t)spindle_encoder.counter.last_count)) * pulse_length;
    }

    switch(request) {

        case SpindleData_Counters:
            spindle_data.pulse_count += (uint16_t)RPM_COUNTER->CNT - (uint16_t)spindle_encoder.counter.last_count;
            break;

        case SpindleData_RPM:
            if(!stopped)
                spindle_data.rpm = spindle_encoder.rpm_factor / (float)pulse_length;
            break;

        case SpindleData_AngularPosition:
            spindle_data.angular_position = (float)spindle_data.index_count +
                    ((float)((uint16_t)spindle_encoder.counter.last_count - (uint16_t)spindle_encoder.counter.last_index) +
                              (pulse_length == 0 ? 0.0f : (float)rpm_timer_delta / (float)pulse_length)) *
                                spindle_encoder.pulse_distance;
            break;
    }

    return spindle_data;
}

...

void RPM_COUNTER_IRQHandler (void)
{
    uint32_t tval = RPM_TIMER->CNT;
    uint16_t cval = RPM_COUNTER->CNT;

    RPM_COUNTER->SR = ~TIM_SR_CC1IF;
    RPM_COUNTER->CCR1 = (uint16_t)(RPM_COUNTER->CCR1 + spindle_encoder.counter.tics_per_irq);

    spindle_data.pulse_count += (uint16_t)(cval - (uint16_t)spindle_encoder.counter.last_count);

    spindle_encoder.counter.last_count = cval;
    spindle_encoder.timer.pulse_length = tval - spindle_encoder.timer.last_pulse;
    spindle_encoder.timer.last_pulse = tval;
}

Making register reads atomic could be a good idea, especially when using high PPR encoders.

HuubBuis commented 3 years ago

I will test this code to night and let you know the results.

darkfibre-nl commented 3 years ago

(edit: this refering to the stm32f411re)

At the risk of this being over my head I would like to mention that both timer2 and timer5 are 32-bit and capable of reading quadrature encoders directly using hardware and little setup code. In the past I've used this to experiment with a printer carriage/dc motor + encoder strip + some PID code, and it worked like a dream.

Timer 2 defines: pin PA0 for the A channel pin PA1 for the B channel.

Alternative(s) for PA0 are PA5 and PA15 and for PA1 is PB3 (default: SWO).

Timer 5 also uses PA0 and PA1, with no alternatives I could find.

This is the information I used: http://www.micromouseonline.com/2013/02/16/quadrature-encoders-with-the-stm32f4/ and STM32CubeMX program to check possible pin assignments. Also referencing the STM32F411xE datasheet (DocID026289 Rev 7) page 27, for an overview of available timers.

Perhaps this may simplify the use of spindle encoders.

langwadt commented 3 years ago

here's and example with four encoders connected to a 411 nucleo, two of them on 32bit timers and two on 16 bit timers safely extended to 32 bit as long as the position is extended more often than the timer can wrap more than once https://github.com/langwadt/nucleo_quad4

terjeio commented 3 years ago

@darkfibre-nl, @langwadt The current spindle sync code does not support quadrature encoder input. Spindle sync when turning the spindle by hand is thus not possible - adding this will IMO be challenging.

I am using a 16 bit counter to count encoder pulses (the timer clock is connected to the encoder pulse input) and a free running 32 bit counter to "timestamp" encoder events. BTW the way this is identical to the approach I use in the MSP432 driver which seems to work well - so a bit strange that nearly identical code did not work with STM32F4xx.

Which timers and pins to use was quite hard to solve due to the way interrupts are assigned in STM32 processors - the choices made was dictated by the the functionality I have chosen for the Nucleo-64 breakout board I am currently working on. Different ways of handling encoders may of course be added for other boards if called for. And if lucky perhaps the SD-card port on my board can be repurposed as a quadrature encoder input hardwired to a timer?

HuubBuis commented 3 years ago

void RPM_COUNTER_IRQHandler (void)
static spindle_data_t spindleGetData (spindle_data_request_t request)

I have added the proposed code to the current grblHAL master version (FPU enabled) and did some test (4 pulse encoder). It certainly changed the behavior but:

the st morpho controller stops responding to G33 commands when running at lower feedrates (30 RPM) Changed spindle_encoder.maximum_tt to 1500 ms. (same problem as before) The reported RPM is now showing the right value. (this problem is solved)
I get a strange PID behaviour even when PID settings are set to 0, tested at different RPM and PID settings but the pattern doesn't change. (This is new)

Still some mysteries to solve. Tomorrow I will run another test doing speed synchronization in foreground.

Threading at 80 RPM PID set to zero

terjeio commented 3 years ago

A 4 PPR encoder is not much, try with setting _tics_perirq to 1 - the default value of 4 means that you effectively have a 1 PPR encoder. tics_per_irq should be set by algorithm...

https://github.com/terjeio/grblHAL/blob/579abbe27fe0aad5f3be705db6c5a924a8e2ffe8/drivers/STM32F4xx/Src/driver.c#L927

I'll check with your configuration later - the calculated angular position could be off.

HuubBuis commented 3 years ago

I have checked several conditions regarding "controller stops responding to G33 commands when running at lower feedrates".

When I disable the actual setting of the PID calculated feed rate in stepperPulseStartSynchronized(), it runs stable. // stepperCyclesPerTick(stepper->exec_segment->cycles_per_tick);

In stepperPulseStartSynchronized() there is a check on the min timer value stepper->exec_segment->cycles_per_tick = (uint32_t)max(ticks, spindle_tracker.min_cycles_per_tick); In stepperCyclesPerTick() there is a check on the max timer value STEPPER_TIMER->ARR = cycles_per_tick < (1UL << 20) ? cycles_per_tick : 0x000FFFFFUL;

The min_cycles_per_tick value = 20, a bit low for doing PID FPU calculations in interrupt time. I experimented somewhat and found that it runs stable ( 1 hour test at 30 RPM) when setting this value to 300. Moving the calculations to foreground, should solve this problem if min_cycles_per_tick is not shorter than the time to process stepperPulseStartSynchronized() .

The min_cycles_per_tick value should be calculated based on the actual max feed rate Z set in the grbl configuration.

terjeio commented 3 years ago

The min_cycles_per_tick value = 20, a bit low for doing PID FPU calculations in interrupt time. I experimented somewhat and found that it runs stable ( 1 hour test at 30 RPM) when setting this value to 300.

This is an indication of that either the PID loop is oscillating or wildly off and/or steps/mm is set very high. What is your P-value ($90)? A good value to start with is 1. Have you tried to set max allowed error output for the PID loop ($95)?

Could the calculation of the angular position be incorrect when using low PPR encoders at low RPMs? I have a hunch that this might be the reason behind the problems. This can be checked by disabling the feedback loop and logging the angular position at the start of each segment while cruising. The increment between each logged value should be relatively constant depending on how stable the spindle RPM is. I have to modify my simulator before I can check this since it is currently not able to simulate 4 PPR at slow spindle speeds.

The min_cycles_per_tick value should be calculated based on the actual max feed rate Z set in the grbl configuration.

Yes - but there must be headroom for the control loop to increase the feedrate. This could be achieved by reducing max allowed feed rate for spindle synced motions with some percentage. The current check is to avoid step pulses to break down completely.

HuubBuis commented 3 years ago

This is an indication of that either the PID loop is oscillating or wildly off and/or steps/mm is set very high. What is your P-value ($90)? A good value to start with is 1. Have you tried to set max allowed error output for the PID loop ($95)?

I have experimented using PIL (limit) values of 0, 0.001, 0.01, 0.5, 5,10. The result are the same, after one or more cycles the controller stops responding to G33 commands but Jogging is possible. steps/mm is set to 800 (1600 on the small lathe)

The min_cycles_per_tick value should be calculated based on the actual max feed rate Z set in the grbl configuration.
Yes - but there must be headroom for the control loop to increase the feed rate.

In grbl_L-mega I have limited the feed rate to 90% of the maximum set. This gives headroom for increasing the feed rate. Even then, the feed rate and acceleration will not exceed the grbl setting limits.
Losing steps will lead to more load, will lead to losing more steps, will lead to more load and finally lead to a disaster. If it happens when threading steel, even the halve nuts and spindle could get damaged.

I will check the PID calculations. There is something not responding well to low feed rates and low resolution counters. Both result in long time between pulses and high tic counts.

HuubBuis commented 3 years ago

I have reverted your proposed changes. The PID loop seems to be functional again. I have run it for half an hour, runs stable now (30 RPM, 4 pulses/rev). The code change needed to fix this issue min_cycles_per_tick = 20, changed to 300. Link to the code master branch

The I am still missing steps. Seems the pulse counter input is very sensitive or the breadboard is getting old.

terjeio commented 3 years ago

In the latest commit there are also changes for reducing the risk for contention, auto tuning of tics_per_irq (can be improved), limited max feed rate to 90% of Z-max, limited min_cycles_per_tick to Z-max rate etc.

Still work to do, I have to change my simulator to check what happens at low PPRs. $SD command can be used to check if counters and calculated angular position is sane. Elements in this report is index count, pulse count and angular position. This can be used when only the spindle is running. IMO it needs an option to reset the spindle data.

Safety limits and handling needs to be worked out. What to do on abrupt loss of encoder pulses, how quickly to detect spindle stop...

I did a mistake in the proposed changes by limiting the timestamp counter to 16 bit. Silly. Note that max. run time for a spindle synced motion is limited by the timestamp counter. 2^32 microseconds for this driver.

HuubBuis commented 3 years ago

In the latest commit there are also changes for reducing the risk for contention, auto tuning of tics_per_irq (can be improved), limited max feed rate to 90% of Z-max, limited min_cycles_per_tick to Z-max rate etc.

This will make threading more reliable and is a good improvement.

Safety limits and handling needs to be worked out. What to do on abrupt loss of encoder pulses, how quickly to detect spindle stop...

I have seen many discussions about how to handle but there is no consensus on what to do. When using G76, retracting (X) could be a solution because the direction of X movements are known. The definition on "what is a sudden loss of encoder pulses" is also quit difficult to define. For the last 4 years, my greatest problem is stalling the spindle. When using the stepper on the spindle, speeds are very moderate and it results a broken tool and damaged workpiece. When using spindle sync threading, speeds are much higher and so will be the damage (I run at 300 RPM max).

I did a mistake in the proposed changes by limiting the timestamp counter to 16 bit Development is breaking something while fixing it.

Note that max. run time for a spindle synced motion is limited by the timestamp counter. 2^32 microseconds for this driver This limits a single threading pass duration to about one hour.

HuubBuis commented 3 years ago

I have a problem disabling interrupts so that foreground reading of counters is atomic. I have found that the instruction __disable_irq(); results in this assembler listing

142        __ASM volatile ("cpsid i" : : : "memory");
0800d5ea:   cpsid   i

The current state of the interrupt status is not saved. When calling a function that also uses this, the interrupt status gets enabled unintended. Reading the DOC it seems that even the effect of this statement is depending on the "Mode (usermode)" and that it is not supported on all arm versions. How a simple task gets complicated....

langwadt commented 3 years ago

@HuubBuis __disable_irq(); just clears PRIMASK you should store and restore as needed, something like:

int32_t primask = __get_PRIMASK(); __disable_irq(); // do dangerous stuff; __set_PRIMASK(primask);

What counters needs special care to read atomic and why?

HuubBuis commented 3 years ago

What counters needs special care to read atomic and why?

Reading 32 bit Counters and Timers is atomic by design.

For spindle sync threading, on the receive of an encoder pulse, a counter value and a timer value is read and stored for later processing. In the interrupt routine, it is atomic.

When these 2 values are read for processing thy should be read as a pair. This can only be done by disabling interrupts, read them store them for use and then enable interrupts again.

I expected disable_irq(); to save the interrupt status and __enable_irq() to restore the old interrupt status but it doesn't. enable_irq() is used in more places in grlHAL for atomic reading/writing and this can lead to unexpected results,

I have checked CMCIS but there is no function that does a combined store/disable IRQ, so I will define one.

langwadt commented 3 years ago

in that case you can probably get around the need for disabling interrupts by reading the counter value before and after the timer value and rereading the pair if the counter value was changed

terjeio commented 3 years ago

When these 2 values are read for processing thy should be read as a pair. This can only be done by disabling interrupts, read them store them for use and then enable interrupts again.

My latest version does that.

__enable_irq() is used in more places in grlHAL for atomic reading/writing and this can lead to unexpected results

Can you explain how and where this may lead to unexpected results that is a real problem?

The only place I can think of in is the calculation of the angular position.

Except for some unavoidable jitter then the only potential problem I am aware of is the pulse counter and index counter interrupts interfering with each other. Note that if the index count interrupt is fired when the pulse count is not exactly PPR pulses since the last then spindle_encoder.error_count is incremented. If this happens then you have a hardware issue?

The pulse counter is updated on the rising edge, index counter on the falling edge. These edges are normally enough apart to avoid contention?

langwadt commented 3 years ago

if you call a function that disable/enable interrupt after you have disabled interrupts you'll return with interrupts enabled where you might think there are still disabled. But you shouldn't be disabling interrupts for mor than a few lines of code anyways

HuubBuis commented 3 years ago

in that case you can probably get around the need for disabling interrupts by reading the counter value before and after the timer value and rereading the pair if the counter value was changed

I have done it this way in the grl-l-mega version.

Can you explain how and where this may lead to unexpected results that is a real problem?

At the moment I haven't found any use of __disable_irq() that could lead to problems (except in my own code) , but I haven't read all code yet. The problem is that after calling an atomic function, interrupts are enabled even if you have disabled them. These problems are hard to find and could appear very rare. Changing the current code to a safe version, can make life easier for other developers. An explanation for the problem for those who aren't familiar using atom processing. There is also a library line 166 that has implemented a more user friendly way for atomic operations.

mwshakoor154 commented 3 years ago

do you plan to add this feature to stm32f103 boards?

terjeio commented 3 years ago

@HuubBuis There are no functions that disables/enables interrupts and are nested in grblHAL. In the core itself there are only two places where interrupt enabled state is changed, in change_completed() in _toolchange.c and there only to safeguard restore of HAL function pointers that are called from interrupt handlers.

@mwshakoor154 I am not planning to add threading to STM32F103 as it does not have the neccesary resources.

langwadt commented 3 years ago

maybe I'm not looking the right places, but does hal.irq_disable(); and hal.irq_enable(); ever get set to anything but a dummy?

terjeio commented 3 years ago

maybe I'm not looking the right places, but does hal.irq_disable(); and hal.irq_enable(); ever get set to anything but a dummy?

I'll add implementations for all drivers in the next commit. Currently the calls are only used in the tool change code to protect some function pointer swaps.

HuubBuis commented 3 years ago

I have implemented the grbl-l-mega spindle synchronization as foreground process (no FPU use in interrupt time) in the STM32F4xx driver. I bench tested it on my rotary table and it works OK (like the Mega). Implementing this for an ESP32 is feasible.

Next week I will try (again) to get the PID version working using the same approach.

The repository is locate here

terjeio / grblHAL

Testing G33 on STM32F411 nucleo fails #176