rosencranz / cortex-ap

Automatically exported from code.google.com/p/cortex-ap
0 stars 0 forks source link

Application halts #9

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Steps to reproduce the problem.
1. connect cortex-ap PPM input to 40 MHz Robbe RC receiver output
2. make sure RC transmitter is off
3. power up cortex-ap

Expected behaviour.
- normal working, with blue LED toggling every second

Actual behaviour.
- blue LED freezes (either light or off) after 3 to 15 seconds
- halting the debugger always finds the code inside some I2C routine
- I2C is in an error condition
- task switch doesn't take place.

Additional information.
- problem doesn't arise with home made RC receiver
- PPM signal of Robbe receiver is much more noisy 
- problem remains with transmitter turned on, only after a longer time
- code doesn't get stuck in the hard fault handler 

Hypotheses.
1. Stack overflow
2. Timer interrupt retriggers, 1. There could be a problem with pipeline races 
that delays clearing the interrupt flag before exiting interrupt handler. This 
causes the interrupt routine to reenter immediately after. According FreeRTOS 
website, adding an instruction that reads the interrupt register should force 
the CPU to wait until its completion, thus avoiding returning before flag has 
been cleared.
3. Timer interrupt retriggers, 2. Due to the fact that only noisy signal causes 
the problem, the interrupt could reenter before its finished. 
4. Wrong interrupt priorities, known issue with FreeRTOS ports for Cortex M 
processors.
5. Stack gets scrambled by automatic variable allocated inside interrupt

Actions.
1. Apply method #1 and #2 from FreeRTOS web site.
2. Add an instruction that reads the interrupt register before exiting routine.
3. Verify that interrupts are disabled when entering timer interrupt. Verify 
that same interrupt / interrupts with same priority can't interrupt itself.
4. Verify priorities assigned to interrupts, compare to those of kernel.
5. Make variable a global one

Results
1. No stack overflow caught.
2. Solution seems only to increase the delay before the problem manifests.
3. TBD
4. TBD
5. TBD

Original issue reported on code.google.com by rosenkr...@email.it on 8 Jan 2013 at 2:22

GoogleCodeExporter commented 9 years ago

Results.
3. Not sure, but looks like that interrupts can be nested.

Actions.
3. Move clearing of interrupt flag to the end of the interrupt routine.
FInd a way to disable interrupt nesting.

Original comment by rosenkr...@email.it on 8 Jan 2013 at 2:54

GoogleCodeExporter commented 9 years ago
Results.
5. No effect

Original comment by rosenkr...@email.it on 8 Jan 2013 at 4:59

GoogleCodeExporter commented 9 years ago
Results.
3. Clearing interrupt flags before exiting has no effect

Original comment by rosenkr...@email.it on 8 Jan 2013 at 5:06

GoogleCodeExporter commented 9 years ago
Results summary.
1. No stack overflow caught.
2. Solution seems only to increase the delay before the problem manifests.
3. Looks like that interrupts can be nested.
   No evidence of interrupt nesting found, at least for TIM 2 interrupt nesting itself.
   Clearing interrupt flags before exiting interrupt routine has no effect.
4. Verified priorities assigned to interrupts, consistent with to those of 
kernel.
   Raised priorities of interrupts above those of kernel: no effect.
5. Changing scope of interrupt variable to global has no effect.
   Increasing stack size of running task from 64 to 256 has no effect.

Hypotheses.
6. System tick (timer) interrupt gets somehow disabled.

Actions.
6. Toggle red LED inside system tick interrupt, check if keeps toggling even 
when blue LED stops.

Original comment by rosenkr...@email.it on 9 Jan 2013 at 12:42

GoogleCodeExporter commented 9 years ago
Results
6. Red LED keeps toggling, sys tick keeps working.

Hypotheses.
7. Overflow of interrupt stack. 
   Methods #1 and #2 detect only overflows of task stacks, see:
   http://www.freertos.org/FreeRTOS_Support_Forum_Archive/July_2012/freertos_Stack_and_Heap_Explanation_5406242.html 
   "On the STM32 that stack is used by main(), and then re-used by interrupts. 
    0x200 could be too small. 
    The interrupt stack is not checked for overflow like the tasks stacks are."

Actions.
7. Increase system stack from current 0x200 to ... (0x400 ?)

Original comment by rosenkr...@email.it on 10 Jan 2013 at 11:01

GoogleCodeExporter commented 9 years ago
Hypotheses.
7. Overflow of interrupt stack. 
   Methods #1 and #2 detect only overflows of task stacks, see:
   http://www.freertos.org/FreeRTOS_Support_Forum_Archive/July_2012/freertos_Stack_and_Heap_Explanation_5406242.html 
   "On the STM32 that stack is used by main(), and then re-used by interrupts. 0x200 could be too small. 
    The interrupt stack is not checked for overflow like the tasks stacks are."
8. Race condition when suspended task tries to add the missed ticks to the 
system tick count, see:
   http://www.freertos.org/FreeRTOS_Support_Forum_Archive/May_2009/freertos_What_is_this_Stack_Overflow_3276262.html
   Probably a PENDSV interrupt swaps the context and another task claims these missed ticks. 

Actions.
7. Increase system stack from current 0x200 to ... (0x400 ?)
8. #define configMAX_SYSCALL_INTERRUPT_PRIORITY 0x10
   #define configKERNEL_INTERRUPT_PRIORITY 0xF0.
   Make sure SYSTICK and PENDSV interrupts have priority lower than 0.

Original comment by rosenkr...@email.it on 10 Jan 2013 at 1:05

GoogleCodeExporter commented 9 years ago
Results.
7. No effect
8. No effect

Hypoytheses.
9. The attitude task starts calling I2C funtions to read MEMS sensors.
   The high rate of capture interrupts somehow disrupts I2C communication.
   The I2C driver gets stuck in some of the while() loops.
   Scheduler is no able to preempt attitude task because it is of highest priority.

Actions.
9. Raise priority of navigation task higher than priotity of attitude task.
   Include I2C read and write operations in critical sections.
   Modify I2C drivers to be able to recover from a fault. This may mean shut off the I2C logic, 
   switch to bit bang mode, manually clock out 9 bits to clear the I2C bus if a peripheral hung.

Original comment by rosenkr...@email.it on 11 Jan 2013 at 8:40

GoogleCodeExporter commented 9 years ago
Results.
9. Changing priorities has no effect.
   Making read and write in critical sections has no effect.
   Problem solved adding CPAL I2C library and modifying I2C_Mems_Driver accordingly.
   System never halted during 5+ hours of continuous working.

To do.
- Check if MEMS data are read correctly (check attitude).
- Check if telemetry works.
- Possibly simplify code structure.

Original comment by rosenkr...@email.it on 14 Jan 2013 at 5:32