Closed nashif closed 7 years ago
by Inaky Perez-Gonzalez:
Still reproduceable on v1.8-branch as of 170531-1731
by Andrew Boie:
This is a timing sensitive test, I wouldn't be surprised if we have some sort of off-by-one thing going on with the retrieved tick values if we cross a boundary at just the wrong time. I'm pretty sure I have an Arduino Due floating around my cube somewhere and will test.
Inaky Perez-Gonzalez has this been reproduced on any other ARM boards or just Arduino Due?
by Anas Nashif:
I can only see this on the Due and the microbit, not on the frdm-k64f or the NRF boards (which have a different timer).
by Andrew Boie:
I'm having trouble reproducing this specific error on bbc_microbit, Anas Nashif are you sure you didn't see the below error instead?:
***** BOOTING ZEPHYR OS v1.7.99 - BUILD: Jun 7 2017 22:37:48 *****
tc_start() - Test kernel CPU and thread routines
Initializing kernel objects
Testing k_cpu_idle()
===================================================================
FAIL - main.
===================================================================
PROJECT EXECUTION FAILED
I was able to isloate the problem and have added a fix for it to my patch series. However even if I enclose test_kernel_interrupts() in an infinite while loop, it has not failed for me so far on bbc_microbit.
Arduino due is a different problem. It does indeed fail in test_kernel_interrupts. WIth the extra printouts I added in my patch series, it appears that locking interrupts doesn't actually prevent SYSTICK timer interrupts from happening:
***** BOOTING ZEPHYR OS v1.7.99 - BUILD: Jun 7 2017 22:59:38 *****
tc_start() - Test kernel CPU and thread routines
Initializing kernel objects
Testing k_cpu_idle()
Testing interrupt locking and unlocking
FAIL - test_kernel_interrupts@327. tick advanced with interrupts locked
===================================================================
FAIL - main.
===================================================================
PROJECT EXECUTION FAILED
However unlike Inaky Perez-Gonzalez original report, I am seeing this 100% of the time. Investigating further.
by Andrew Boie:
I can confirm that even with interrupts locked with irq_lock(), I am still getting SYSTICK-exception based timer interrupts on my Arduino Due board.
Questions that come to mind:
Maureen Helm Kumar Gala if either of you have access to supported Zephyr ARM boards that are Cortex-M3 based and use SYSTICK timer, can you run tests/kernel/context on it and tell me if the "Testing interrupt locking and unlocking" part fails for you?
My suspicion is that something is not right with how the priority threshold is set when locking interrupts, apparently there is a policy to allow certain kinds of interrupts/exceptions to continue to fire when interrupts are locked, which seems unique to ARM.
by Anas Nashif:
I just did a bisect and found the offending commit:
commit ee66ee6c344f1b009cff050b408229450231a853
Author: Piotr Mienkowski <piotr.mienkowski@gmail.com>
Date: Sun May 7 22:17:54 2017 +0200
arch: sam3x: Use ASF library
This patch converts Atmel sam3x MCU series to use register
header files from Atmel Software Framework (ASF) library.
By using ASF different Atmel SAM MCU series can use common
device drivers.
Signed-off-by: Piotr Mienkowski <piotr.mienkowski@gmail.com>
Piotr Mienkowski Can you please take a look?
by Piotr Mienkowski:
The commit in question introduced ext/hal/atmel/asf/sam/include/sam3x/sam3x8e.h which contains CMSIS compatible SoC configuration from Atmel. This file is indirectly included by include/arch/arm/cortex_m/cmsis.h and consequently drivers/timer/cortex_m_systick.c.
sam3x8e.h states:
#define __NVIC_PRIO_BITS 4
whereas before __NVIC_PRIO_BITS was set to 3. I'm afraid though the sam3x8e.h is correct. According to the SAM3X series datasheet "The NVIC provides up to 16 interrupt priority levels."
by Andrew Boie:
I see the problem but I'm not sure the proper fix. The DTS appears to be wrong!
This is what gets generated_dts_board.h
arduino_due/include/generated/generated_dts_board.h
25-
26-/* interrupt-controller@e000e100 */
27:#define ARM_V7M_NVIC_E000E100_ARM_NUM_IRQ_PRIORITY_BITS 3
28-#define ARM_V7M_NVIC_E000E100_BASE_ADDRESS_0 0xe000e100
29-#define ARM_V7M_NVIC_E000E100_SIZE_0 3072
In the fixup file for Arduino Due we set CONFIG_NUM_IRQ_PRIO_BITS
dts/arm/arduino_due.fixup
6-
7-
8:#define CONFIG_NUM_IRQ_PRIO_BITS ARM_V7M_NVIC_E000E100_ARM_NUM_IRQ_PRIORITY_BITS
9-#define CONFIG_UART_ATMEL_SAM3_NAME ATMEL_SAM3X_UART_400E0800_LABEL
So we have a situation where __NVIC_PRIO_BITS is 4, but CONFIG_NUM_IRQ_PRIO_BITS is 3. These values should match!
by Andrew Boie:
by Piotr Mienkowski:
Thanks Andrew Boie for the fix. I believe however that we probably should do more.
I assumed incorrectly that CONFIG_NUM_IRQ_PRIO_BITS is not used any more apart from the include/arch/arm/cortex_m/cmsis.h file but that's not the case. Currently parts of the code base are using CONFIG_NUM_IRQ_PRIO_BITS and parts __NVIC_PRIO_BITS. Shouldn't we be consistent?
To comply with CMSIS we would need to replace all occurrences of CONFIG_NUM_IRQ_PRIO_BITS with __NVIC_PRIO_BITS and only in ./include/arch/arm/cortex_m/cmsis.h leave
#define __NVIC_PRIO_BITS CONFIG_NUM_IRQ_PRIO_BITS
That would be of course another Jira issue. I'm only asking if we need one?
by Andrew Boie:
Piotr Mienkowski agreed, "Don't Repeat Yourself" is one of my favorite mantras in computer science. This seems worthy of cleanup. I don't think I'll personally have time to work on it but we should have it in the backlog.
by Piotr Mienkowski:
Unless there are other volunteers I can prepare the patch. I have a quick question though: for such a cleanup task shall I create a new Jira issue or a pull request is enough?
by Anas Nashif:
Piotr Mienkowski yes, please create a jira for the sake of traceability.
Related to GH-2097
Reported by Inaky Perez-Gonzalez:
Console reports failure
Note this failure doesn't show constantly on our runs, but fails about 60% of the time.
(Imported from Jira ZEP-2243)