Closed chaseblock closed 3 years ago
After consulting with some sources:
This is probably due to uC/OS not properly handling the floating point registers during a context switch, and losing track of where on the stack the PC is when it attempts to return from a context switch.
Our options as I see it are:
@ClarkPoon @JimothyGreene I would like your opinions on these options within the next few days so that we aren't blocked by this for a long while.
@Cam0Cow @ErickCortez98 This possibly affects your system as well, although I haven't really looked yet. It might be good for y'all to take a look at any floating point math that y'all are doing and ask whether or not it needs to be floats.
I looked through the BPS code, and it looks like we are mainly using floating point math for the following:
Do you know how long it takes to execute a floating point division on the stm32f413 without the FPU? I am concerned about point 3 because we are expecting to send over 900 floating point values over CAN each second (10 x 31 voltages + 10 x 62 temperatures). This seems like it could be quite CPU intensive if each typecast + division takes a while. It might be worth it to consider a fourth option:
Remove as much floating point math as possible, but keep the FPU on and treat floating point operations as a critical section (eg. disable interrupts when we do them).
If we chose to leave the FPU enabled, we would have to make sure we have a handler set up to catch any faults caused by this issue
From a BPS perspective, I think the best option would be to remove all floating point and change the CAN spec (which would change the data format that array and sunlight receive from BPS). What do you think @chaseblock @JimothyGreene @afnanmmir @dimembermatt
Discovered while investigating #378.
Introducing floating point operations in certain places within the code seem to cause uC/OS to enter a hardfault somewhere around
OSSched()
.For a reproduction case, see Test_Tasks.c on the RTOS branch. This executes fine. However, when Task2 is modified as below, it hardfaults as described. This behavior has also been seen in the CAN_Task test (see the linked issue). When Task1 is modified in a similar way, everything still appears to work.
The change:
For the hardfault debugging registers: see these dumps from GDB.
This puts us squarely in
IBUSERR
territory. I unfortunately haven't been able to find many reports of people experiencing these issues, and haven't been able to work around it so far.@ClarkPoon @JimothyGreene I'm wondering if anyone has any ideas on this. I'm going to keep experimenting.
@SijWoo have you ever seen anything like this?