I think I've tracked down why I periodically get SPI timeouts reported. They're due to an arithmetic overflow in the timer update in the udriver firmware.
In dual_motor_torque_ctrl.c, it checks for timeout by comparing the last timestamp a packet was received from the hall sensors (gSPILastReceivedIqRef_stamp) with the current time (gTimer0_stamp):
gErrors.bit.spi_recv_timeout = (
gSPIReceiveIqRefTimeout != 0 // and timeout is enabled
// check if one of the motors is enabled and has a IqRef != 0
&& ((gMotorVars[HAL_MTR1].Flag_Run_Identify
&& gMotorVars[HAL_MTR1].IqRef_A != 0)
|| (gMotorVars[HAL_MTR2].Flag_Run_Identify
&& gMotorVars[HAL_MTR2].IqRef_A != 0))
// finally check if last message exceeds timeout
&& (gSPILastReceivedIqRef_stamp
< gTimer0_stamp - gSPIReceiveIqRefTimeout)
);
So far so good, but the way gTimer0_stamp is calculated makes it wrap around to zero in much less than 2^32 ticks. In timer0_ISR, it does this:
But C calculates 1000 * gTimer0_cnt before dividing, so it rolls over every 2^32/4000/1000 seconds, about 17 minutes.
When it's close to zero, gTimer0_stamp - gSPIReceiveIqRefTimeout wraps around to a huge number. So if you happen to be controlling the robot at that moment, it shuts down and you have to power cycle it.
As a workaround I can disable the timeout check, but I worry that it'll fry the electronics if the hall sensors actually stop reporting.
Thanks a lot for tracking this down! We had already noticed that there is some issue always occurring after ~17 min but couldn't find the cause so far.
I think I've tracked down why I periodically get SPI timeouts reported. They're due to an arithmetic overflow in the timer update in the udriver firmware.
In
dual_motor_torque_ctrl.c
, it checks for timeout by comparing the last timestamp a packet was received from the hall sensors (gSPILastReceivedIqRef_stamp
) with the current time (gTimer0_stamp
):So far so good, but the way
gTimer0_stamp
is calculated makes it wrap around to zero in much less than 2^32 ticks. Intimer0_ISR
, it does this:But C calculates
1000 * gTimer0_cnt
before dividing, so it rolls over every 2^32/4000/1000 seconds, about 17 minutes. When it's close to zero,gTimer0_stamp - gSPIReceiveIqRefTimeout
wraps around to a huge number. So if you happen to be controlling the robot at that moment, it shuts down and you have to power cycle it.As a workaround I can disable the timeout check, but I worry that it'll fry the electronics if the hall sensors actually stop reporting.
It's probably a 1-line fix to https://github.com/open-dynamic-robot-initiative/udriver_firmware/blob/c98d2296aaf409cd8ef2ce0df2e0a0a4a73943c0/firmware/firmware_spi/mw_dual_motor_torque_ctrl/src/dual_motor_torque_ctrl.c#L1351