zephyriot / zephyr-issues

0 stars 0 forks source link

irq_lock() does not disable SYSTICK on Arduino Due #2079

Closed nashif closed 7 years ago

nashif commented 7 years ago

Reported by Inaky Perez-Gonzalez:

Console reports failure

E<span>#</span>1      @jfsotc03/arduino2-17  console output: ***** BOOTING ZEPHYR OS v1.7.99 - BUILD: Jun  1 2017 00:38:38 *****
E<span>#</span>1      @jfsotc03/arduino2-17  console output: tc_start() - Test kernel CPU and thread routines
E<span>#</span>1      @jfsotc03/arduino2-17  console output: Initializing kernel objects
E<span>#</span>1      @jfsotc03/arduino2-17  console output: Testing k_cpu_idle()
E<span>#</span>1      @jfsotc03/arduino2-17  console output: Testing interrupt locking and unlocking
E<span>#</span>1      @jfsotc03/arduino2-17  console output: ===================================================================
E<span>#</span>1      @jfsotc03/arduino2-17  console output: FAIL - main.
E<span>#</span>1      @jfsotc03/arduino2-17  console output: ===================================================================
E<span>#</span>1      @jfsotc03/arduino2-17  console output: RunID: ci-170531-1731-379:zpg1
E<span>#</span>1      @jfsotc03/arduino2-17  console output: PROJECT EXECUTION FAILED

Note this failure doesn't show constantly on our runs, but fails about 60% of the time.

Target Name  Remote ID             Type     
-----------  --------------------  -------- 
target       jfsotc03/arduino2-17  arduino2 

Detailed description
====================
Automated test run failed.

Output (the target might have produced none; GDB will be needed at this point):

         @local                 will run on target group 'target=jfsotc03/arduino2-17:arm'
         @local                 configure passed  type:arduino2
Bpeke    @jfsotc03/arduino2-17  build passed: 'mkdir -p /tmp/tcf.runci-170531-1731-379-a_6D3h/y0rg/outdir-y0rg-zpg1-arduino_due; rm -f /tmp/tcf.runci-170531-1731-379-a_6D3h/y0rg/outdir-y0rg-zpg1-arduino_due/.config' @/home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/tcf.git/tcfl/app_zephyr.py:113
Bpeke    @jfsotc03/arduino2-17  build passed: 'make -j -C /home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/target.git/tests/kernel/context/ EXTRA_CFLAGS="-Werror -Wno-error=deprecated-declarations" KCPPFLAGS=-DTC_RUNID=ci-170531-1731-379:zpg1 BOARD=arduino_due ARCH=arm  O=/tmp/tcf.runci-170531-1731-379-a_6D3h/y0rg/outdir-y0rg-zpg1-arduino_due initconfig' @/home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/tcf.git/tcfl/app_zephyr.py:155
Bpeke    @jfsotc03/arduino2-17  build output: Using /home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/target.git/boards/arm/arduino_due/arduino_due_defconfig as base
Bpeke    @jfsotc03/arduino2-17  build output: Merging /home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/target.git/tests/include/test.config
Bpeke    @jfsotc03/arduino2-17  build output: Merging /home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/target.git/kernel/configs/kernel.config
Bpeke    @jfsotc03/arduino2-17  build output: Merging prj.conf
Bpeke    @jfsotc03/arduino2-17  build output: #
Bpeke    @jfsotc03/arduino2-17  build output: # configuration written to .config
Bpeke    @jfsotc03/arduino2-17  build output: #
Bpeke    @jfsotc03/arduino2-17  build passed: 'make -j -C /home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/target.git/tests/kernel/context EXTRA_CFLAGS="-Werror -Wno-error=deprecated-declarations" KCPPFLAGS=-DTC_RUNID=ci-170531-1731-379:zpg1 BOARD=arduino_due ARCH=arm  O=/tmp/tcf.runci-170531-1731-379-a_6D3h/y0rg/outdir-y0rg-zpg1-arduino_due' @/home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/tcf.git/tcfl/app_zephyr.py:185
         @local                 build passed  type:arduino2
Diy8r<span>#</span>1  @jfsotc03/arduino2-17  deployed (try 1/4) kernel-arm:/tmp/tcf.runci-170531-1731-379-a_6D3h/y0rg/outdir-y0rg-zpg1-arduino_due/zephyr.bin
Diy8r    @jfsotc03/arduino2-17  deploy (1 tries) passed
         @local                 deploy passed  type:arduino2
E<span>#</span>1      @jfsotc03/arduino2-17  Reset
E<span>#</span>1      @jfsotc03/arduino2-17  eval failed: found expected (for failure) `PROJECT EXECUTION FAILED` in console `jfsotc03/arduino2-17:default` at 0.57s
E<span>#</span>1      @jfsotc03/arduino2-17  console output: ***** BOOTING ZEPHYR OS v1.7.99 - BUILD: Jun  1 2017 00:38:38 *****
E<span>#</span>1      @jfsotc03/arduino2-17  console output: tc_start() - Test kernel CPU and thread routines
E<span>#</span>1      @jfsotc03/arduino2-17  console output: Initializing kernel objects
E<span>#</span>1      @jfsotc03/arduino2-17  console output: Testing k_cpu_idle()
E<span>#</span>1      @jfsotc03/arduino2-17  console output: Testing interrupt locking and unlocking
E<span>#</span>1      @jfsotc03/arduino2-17  console output: ===================================================================
E<span>#</span>1      @jfsotc03/arduino2-17  console output: FAIL - main.
E<span>#</span>1      @jfsotc03/arduino2-17  console output: ===================================================================
E<span>#</span>1      @jfsotc03/arduino2-17  console output: RunID: ci-170531-1731-379:zpg1
E<span>#</span>1      @jfsotc03/arduino2-17  console output: PROJECT EXECUTION FAILED

(Imported from Jira ZEP-2243)

nashif commented 7 years ago

by Inaky Perez-Gonzalez:

Still reproduceable on v1.8-branch as of 170531-1731

nashif commented 7 years ago

by Andrew Boie:

This is a timing sensitive test, I wouldn't be surprised if we have some sort of off-by-one thing going on with the retrieved tick values if we cross a boundary at just the wrong time. I'm pretty sure I have an Arduino Due floating around my cube somewhere and will test.

Inaky Perez-Gonzalez has this been reproduced on any other ARM boards or just Arduino Due?

nashif commented 7 years ago

by Anas Nashif:

I can only see this on the Due and the microbit, not on the frdm-k64f or the NRF boards (which have a different timer).

nashif commented 7 years ago

by Andrew Boie:

I'm having trouble reproducing this specific error on bbc_microbit, Anas Nashif are you sure you didn't see the below error instead?:

***** BOOTING ZEPHYR OS v1.7.99 - BUILD: Jun  7 2017 22:37:48 *****
tc_start() - Test kernel CPU and thread routines
Initializing kernel objects
Testing k_cpu_idle()
===================================================================
FAIL - main.
===================================================================
PROJECT EXECUTION FAILED

I was able to isloate the problem and have added a fix for it to my patch series. However even if I enclose test_kernel_interrupts() in an infinite while loop, it has not failed for me so far on bbc_microbit.

Arduino due is a different problem. It does indeed fail in test_kernel_interrupts. WIth the extra printouts I added in my patch series, it appears that locking interrupts doesn't actually prevent SYSTICK timer interrupts from happening:

***** BOOTING ZEPHYR OS v1.7.99 - BUILD: Jun  7 2017 22:59:38 *****
tc_start() - Test kernel CPU and thread routines
Initializing kernel objects
Testing k_cpu_idle()
Testing interrupt locking and unlocking
FAIL - test_kernel_interrupts@327. tick advanced with interrupts locked
===================================================================
FAIL - main.
===================================================================
PROJECT EXECUTION FAILED

However unlike Inaky Perez-Gonzalez original report, I am seeing this 100% of the time. Investigating further.

nashif commented 7 years ago

by Andrew Boie:

I can confirm that even with interrupts locked with irq_lock(), I am still getting SYSTICK-exception based timer interrupts on my Arduino Due board.

Questions that come to mind:

Why isn't this affecting QEMU? Bugs in the emulator?

Why haven't we noticed this before? Surely getting timer interrupts while interrupts are supposedly locked would cause other tests to fail

Is this specific to Arduino Due, or are other Cortex-M3 platforms with SYSTICK timers affected?

Maureen Helm Kumar Gala if either of you have access to supported Zephyr ARM boards that are Cortex-M3 based and use SYSTICK timer, can you run tests/kernel/context on it and tell me if the "Testing interrupt locking and unlocking" part fails for you?

My suspicion is that something is not right with how the priority threshold is set when locking interrupts, apparently there is a policy to allow certain kinds of interrupts/exceptions to continue to fire when interrupts are locked, which seems unique to ARM.

nashif commented 7 years ago

by Anas Nashif:

I just did a bisect and found the offending commit:

commit ee66ee6c344f1b009cff050b408229450231a853
Author: Piotr Mienkowski <piotr.mienkowski@gmail.com>
Date:   Sun May 7 22:17:54 2017 +0200

    arch: sam3x: Use ASF library

    This patch converts Atmel sam3x MCU series to use register
    header files from Atmel Software Framework (ASF) library.
    By using ASF different Atmel SAM MCU series can use common
    device drivers.

    Signed-off-by: Piotr Mienkowski <piotr.mienkowski@gmail.com>

Piotr Mienkowski Can you please take a look?

nashif commented 7 years ago

by Piotr Mienkowski:

The commit in question introduced ext/hal/atmel/asf/sam/include/sam3x/sam3x8e.h which contains CMSIS compatible SoC configuration from Atmel. This file is indirectly included by include/arch/arm/cortex_m/cmsis.h and consequently drivers/timer/cortex_m_systick.c.

sam3x8e.h states:

#define __NVIC_PRIO_BITS       4

whereas before __NVIC_PRIO_BITS was set to 3. I'm afraid though the sam3x8e.h is correct. According to the SAM3X series datasheet "The NVIC provides up to 16 interrupt priority levels."

nashif commented 7 years ago

by Andrew Boie:

I see the problem but I'm not sure the proper fix. The DTS appears to be wrong!

This is what gets generated_dts_board.h

arduino_due/include/generated/generated_dts_board.h
25-
26-/* interrupt-controller@e000e100 */
27:#define ARM_V7M_NVIC_E000E100_ARM_NUM_IRQ_PRIORITY_BITS      3
28-#define ARM_V7M_NVIC_E000E100_BASE_ADDRESS_0         0xe000e100
29-#define ARM_V7M_NVIC_E000E100_SIZE_0             3072

In the fixup file for Arduino Due we set CONFIG_NUM_IRQ_PRIO_BITS

dts/arm/arduino_due.fixup
6-
7-
8:#define CONFIG_NUM_IRQ_PRIO_BITS  ARM_V7M_NVIC_E000E100_ARM_NUM_IRQ_PRIORITY_BITS
9-#define CONFIG_UART_ATMEL_SAM3_NAME   ATMEL_SAM3X_UART_400E0800_LABEL

So we have a situation where __NVIC_PRIO_BITS is 4, but CONFIG_NUM_IRQ_PRIO_BITS is 3. These values should match!

nashif commented 7 years ago

by Andrew Boie:

https://github.com/zephyrproject-rtos/zephyr/pull/433

nashif commented 7 years ago

by Piotr Mienkowski:

Thanks Andrew Boie for the fix. I believe however that we probably should do more.

I assumed incorrectly that CONFIG_NUM_IRQ_PRIO_BITS is not used any more apart from the include/arch/arm/cortex_m/cmsis.h file but that's not the case. Currently parts of the code base are using CONFIG_NUM_IRQ_PRIO_BITS and parts __NVIC_PRIO_BITS. Shouldn't we be consistent?

To comply with CMSIS we would need to replace all occurrences of CONFIG_NUM_IRQ_PRIO_BITS with __NVIC_PRIO_BITS and only in ./include/arch/arm/cortex_m/cmsis.h leave

#define __NVIC_PRIO_BITS               CONFIG_NUM_IRQ_PRIO_BITS

That would be of course another Jira issue. I'm only asking if we need one?

nashif commented 7 years ago

by Andrew Boie:

Piotr Mienkowski agreed, "Don't Repeat Yourself" is one of my favorite mantras in computer science. This seems worthy of cleanup. I don't think I'll personally have time to work on it but we should have it in the backlog.

nashif commented 7 years ago

by Piotr Mienkowski:

Unless there are other volunteers I can prepare the patch. I have a quick question though: for such a cleanup task shall I create a new Jira issue or a pull request is enough?

nashif commented 7 years ago

by Anas Nashif:

Piotr Mienkowski yes, please create a jira for the sake of traceability.

nashif commented 7 years ago

Related to GH-2097