Open Candas1 opened 12 months ago
An associated change would be to change the STM32 PWM initialisation to work in the same way as Renesas, RP2040, ESP32 or SAMD: to use the highest value for ARR supported by the timer - in other words, to adjust the pre-scaler according to the chosen frequency to use the highest resolution we can. At the moment we use a fixed 4096 on STM32, when some MCUs could do better :-) Also, it would be intuitive that (like with the other drivers) I can trade off PWM frequency for increased resolution.
Another good change might be to use the registers directly when setting the PWM, rather than using the HAL functions? I haven't checked, but knowing HAL this might save a lot of cycles...
The drivers can report back the PWM range using the driver's data structure, so we can work with ARR+1 like you suggest.
Another point on the PWM topic:
the way the code is structured calculations are interleaved with the setting of the PWM to the different phases. From my point of view this increases the chance of a timer reset event occurring during the process of setting the PWM to the phases. This will result in a duty cycle where some phases are in the new state and some phases are in the previous state regarding the PWM settings. While this isn't critical, it would of course be nicer to avoid it. Changing the code so all calculations happen first, and then all registers are updated in rapid succession doesn't eliminate, but reduces the chance of these events happening.
Thanks for your comments Richard. This is a topic I started to explore, and so far I haven't found anything obviously wrong, and the changes I tried didn't greatly improve the performance. I was looking mainly at hardware 6pwm because that's the most straight forward, low level should work well in this case as all the channels are from the same timer.
the way the code is structured calculations are interleaved with the setting of the PWM to the different phases. From my point of view this increases the chance of a timer reset event occurring during the process of setting the PWM to the phases. This will result in a duty cycle where some phases are in the new state and some phases are in the previous state regarding the PWM settings. While this isn't critical, it would of course be nicer to avoid it. Changing the code so all calculations happen first, and then all registers are updated in rapid succession doesn't eliminate, but reduces the chance of these events happening.
Good point, it could be that the current code is setting the pwm for each channel too far appart.
There is also something I took for granted but might not be enabled with the current code, it's the preload enable bit. You usually don't want the pwm to be applied directly, this function uses shadow registers so that the changes are applied only at the end of the pwm period to have consistent switching. [EDIT] It's ok preload is enabled.
I am in my bed tried to fall asleep, and I think I found one reason why it's slow.
We shouldn't do math in _constrain(), I think it does the math 3 times 😅
Need to test when I wake up.
This is what I meant:
But it takes same time even if I move the math out, it's probably optimized by the compiler. It might be still worth moving the math out for all those, in case it's not optimized with some of the toolchains.
Surprisingly this takes 8us on the stm32f1:
dc_a = _constrain(dc_a, 0.0f , 1.0f );
dc_b = _constrain(dc_b, 0.0f , 1.0f );
dc_c = _constrain(dc_c, 0.0f , 1.0f );
But I am not even sure it's needed, as we do this before:
Ua = _constrain(Ua, 0, voltage_limit);
Ub = _constrain(Ub, 0, voltage_limit);
Uc = _constrain(Uc, 0, voltage_limit);
voltage_limit should be smaller or equal to voltage_power_supply (see init) unless it's wrongly set after the init.
Worst case we could constrain CCR1 CCR2 CCR3 between 0 and ARR+1, it will be faster as it's integer math. But that also impacts all drivers then.
Looking over the code and the different drivers, I think the situation is this:
So internally, we could change the way the driver and the hardware driver interact: the current limit - scale - limit - scale operation could be folded into just a limit - scale operation, if we know the counter range within the driver class already. Doing this would be an internal API change, but affect all the drivers.
Longer term one could remove the driver limit and roll it into the motor limit, so you have only one voltage limit that you have to set for the system. It would be a simplification from some point of view, but it would be an external API change...
Hmmm - actually the driver limit defines the centre-point of the modulation? So we actually do need a separate driver and motor voltage limit?
Antun had shared this in the forum, yes we need both:
Those are equivalent, so it could be done in different places: Voltage_power_supply = 1.0 = ARR +1
Again I am just analyzing, if it turns out changes require a lot of effort but bring little benefits, it's not worth it.
Timing on an stm32f401 using pin toggling, a back to back toggle takes ~500ns. All reported times are from falling edge to rising edge which I assume means it includes 1 period of this base latency (ie if toggles were instant, the reported duration would be shorter by 500ns). Total time for the higher level setPwm call is ~9.5-10.5us.
digitalToggle(PC14);
Ua = _constrain(Ua, 0, voltage_limit);
Ub = _constrain(Ub, 0, voltage_limit);
Uc = _constrain(Uc, 0, voltage_limit);
// calculate duty cycle
// limited in [0,1]
float v_supply_inv = 1/voltage_power_supply;
dc_a = _constrain(Ua * v_supply_inv, 0.0f , 1.0f );
dc_b = _constrain(Ub * v_supply_inv, 0.0f , 1.0f );
dc_c = _constrain(Uc * v_supply_inv, 0.0f , 1.0f );
digitalToggle(PC14);
1.56us
digitalToggle(PC14);
_setSinglePhaseState(phase_state[0], ((STM32DriverParams*)params)->timers[0], ((STM32DriverParams*)params)->channels[0], ((STM32DriverParams*)params)->channels[1]);
if(phase_state[0] == PhaseState::PHASE_OFF) dc_a = 0.0f;
digitalToggle(PC14);
2.1us, this obviously happens 3 times, so it's a big contributor.
digitalToggle(PC14);
_setPwm(((STM32DriverParams*)params)->timers[0], ((STM32DriverParams*)params)->channels[0], _PWM_RANGE*dc_a, _PWM_RESOLUTION);
digitalToggle(PC14);
1.8us, again 3 times
TIM_HandleTypeDef *tim = HT->getHandle();
uint32_t CCR_RegisterValue = ((__HAL_TIM_GET_AUTORELOAD(tim) + 1) * value) / ((1 << resolution) - 1) ;
int chan = HT->getChannel(channel);
__HAL_TIM_SET_COMPARE(tim, chan, CCR_RegisterValue);
Replacing the call to HT->setCaptureCompare in _setPwm with this saves roughly 1us per call, for a total of 3us.
I was checking the UDIS bit @Copper280z found. LL_TIM_EnableUpdateEvent() and LL_TIM_DisableUpdateEvent() are available for all the STM32 families so we could use it. But this bit is set at timer level and not channel level, it could be used when the whole timer is dedicated to a single driver or we have to loop through all the timers of a driver and it makes sense only if the timers are aligned. I would still separate the math for calculating pwm values and setting CCRX registers to reduce the risk of setting the channels at different times, but it should be done in 2PWM/3PWM/4PWM/hardware6PWM/software6PWM.
Something like this for hardware6PWM:
TIM_HandleTypeDef *tim = (STM32DriverParams*)params)->timers[0]->getHandle();
uint32_t ARR = __HAL_TIM_GET_AUTORELOAD(tim) +1;
// directly scaling to ARR instead of _PWM_RESOLUTION
uint32_t CCRA = ARR * dc_a;
uint32_t CCRB = ARR * dc_b;
uint32_t CCRC = ARR * dc_c;
LL_TIM_DisableUpdateEvent(tim.instance)
__HAL_TIM_SET_COMPARE(tim, ((STM32DriverParams*)params)->channels[0], CCRA);
__HAL_TIM_SET_COMPARE(tim, ((STM32DriverParams*)params)->channels[2], CCRB);
__HAL_TIM_SET_COMPARE(tim, ((STM32DriverParams*)params)->channels[4], CCRC);
LL_TIM_EnableUpdateEvent(tim.instance)
Hi,
A topic has been on my mind last few days, thought I would socialize it to see if it's worth exploring it.
A few months back I measured it takes 48us on STM32F1 to run the setpwm function for 6PWM. Which is more than the inverse clarke/park and svpwm combined in my case.
I think we are converting to too many different value domains:
Few ideas:
[UPDATE]