open-dynamic-robot-initiative / udriver_firmware

Firmware of the ODRI udriver for either SPI or CAN control.
https://open-dynamic-robot-initiative.github.io/udriver_firmware
7 stars 3 forks source link

SPI timeouts #13

Open tlbtlbtlb opened 2 years ago

tlbtlbtlb commented 2 years ago

I think I've tracked down why I periodically get SPI timeouts reported. They're due to an arithmetic overflow in the timer update in the udriver firmware.

In dual_motor_torque_ctrl.c, it checks for timeout by comparing the last timestamp a packet was received from the hall sensors (gSPILastReceivedIqRef_stamp) with the current time (gTimer0_stamp):

  gErrors.bit.spi_recv_timeout = (
     gSPIReceiveIqRefTimeout != 0 // and timeout is enabled
        // check if one of the motors is enabled and has a IqRef != 0
       && ((gMotorVars[HAL_MTR1].Flag_Run_Identify
          && gMotorVars[HAL_MTR1].IqRef_A != 0)
        || (gMotorVars[HAL_MTR2].Flag_Run_Identify
           && gMotorVars[HAL_MTR2].IqRef_A != 0))
    // finally check if last message exceeds timeout
     && (gSPILastReceivedIqRef_stamp
        < gTimer0_stamp - gSPIReceiveIqRefTimeout)
            );

So far so good, but the way gTimer0_stamp is calculated makes it wrap around to zero in much less than 2^32 ticks. In timer0_ISR, it does this:

#define TIMER0_FREQ_Hz 4000
...
uint32_t gTimer0_cnt = 0;
uint32_t gTimer0_stamp = 0;
...
  ++gTimer0_cnt;
  gTimer0_stamp = 1000 * gTimer0_cnt / TIMER0_FREQ_Hz;

But C calculates 1000 * gTimer0_cnt before dividing, so it rolls over every 2^32/4000/1000 seconds, about 17 minutes. When it's close to zero, gTimer0_stamp - gSPIReceiveIqRefTimeout wraps around to a huge number. So if you happen to be controlling the robot at that moment, it shuts down and you have to power cycle it.

As a workaround I can disable the timeout check, but I worry that it'll fry the electronics if the hall sensors actually stop reporting.

It's probably a 1-line fix to https://github.com/open-dynamic-robot-initiative/udriver_firmware/blob/c98d2296aaf409cd8ef2ce0df2e0a0a4a73943c0/firmware/firmware_spi/mw_dual_motor_torque_ctrl/src/dual_motor_torque_ctrl.c#L1351

luator commented 2 years ago

Thanks a lot for tracking this down! We had already noticed that there is some issue always occurring after ~17 min but couldn't find the cause so far.

@thomasfla @jviereck FYI.