Open stephanosio opened 4 years ago
cc @andrewboie
Related to #22956
@stephanosio the problem here is that when IRQs are locked, k_busy_wait() hangs forever. This is the bug, the issue is not the test, the test happens to expose it.
I would NACK any patch that changes the test to sweep this issue under the rug. If anything, a specific dedicated test that shows that k_busy_wait() works with IRQs locked needs to be added
Jumping in with a quick triage on those two drivers, just for postericty: yeah, what both of these do is return a static in-memory count from z_timer_cycle_get_32() which is updated only from the timer ISR. So anything that blocks the timer ISR from arriving will prevent this clock from advancing.
The intent of this API is that it should be a synchronous read from some sort of hardware counter, so this scheme (implementing the cycle counter in terms of the timer driver) really can't be correct. Though apparently we can get away with it for a lot of tests (most k_busy_wait() happens in drivers, which are platform-specific anyway).
I don't know the Altera and Xilinx hardware in question. Surely they have a register where we can read the current count, no?
This (unused) register in the Xilinx driver looks promising:
#define XTTCPS_COUNT_VALUE_OFFSET 0x00000018U /**< Current Counter Value */
The Altera driver is on top of a HAL with thicker headers, so I gave up looking. I'm sure it has one too though.
@andyross Altera requires another timer source AFAIK. There is a comment that explains this. This is a virtual FPGA CPU so it's possible to configure this when defining the CPU itself. Nobody's had time or inclinationt to add support yet.
/* Per the Altera Embedded IP Peripherals guide, you cannot
* use a timer instance for both the system clock and timestamps
* at the same time.
*
* Having this function return accumulated_cycle_count + get_snapshot()
* does not work reliably. It's possible for the current countdown
* to reset to the next interval before the timer interrupt is
* delivered (and accumulated cycle count gets updated). The result
* is an unlucky call to this function will appear to jump backward
* in time.
*
* To properly obtain timestamps, the CPU must be configured with
* a second timer peripheral instance that is configured to
* count down from some large initial 64-bit value. This
* is currently unimplemented.
*/
If anything, a specific dedicated test that shows that k_busy_wait() works with IRQs locked needs to be added
@andrewboie That would be helpful. There seem to be quite a few (at least 3 IIRC) timer drivers that do not conform to this.
@andyross For Xilinx (and possibly Altera), we would need to do one of the following:
It seems like I might already have a fix here (I was going to implement tickless support for TTC, but decided to drop it because QEMU didn't like it; #19869): https://github.com/stephanosio/zephyr/commit/9da8fdef6db8d29e658cbafb3e69f9fca6dd6961
I will create a PR for this today (without tickless support).
As far as I can see, the litex_timer
should have the same problem.
@mateusz-holenko @kgugala @pgielda Could you confirm?
@stephanosio It seems so after reading the code, but we'll verify this on HW.
One note relative to the issues being discussed here: devices without access to a proper cycle counter can solve this alternatively with CONFIG_ARCH_HAS_CUSTOM_BUSY_WAIT and an appropriate amount of magic (e.g. a hand-tuned assembly loop). The use of z_timer_get_cycle_32() is just the default implementation.
@stephanosio I confirm that the test fails on the HW:
(venv) grey@grey:~/localantmicro/zephyr/sanity-out/litex_vexriscv/arch.interrupt$ sudo flterm --port /dev/ttyUSB1 --kernel zephyr/zephyr.bin --speed 115200
[sudo] password for grey:
[FLTERM] v2.4-31-g8898c78 Starting...
__ _ __ _ __
/ / (_) /____ | |/_/
/ /__/ / __/ -_)> <
/____/_/\__/\__/_/|_|
(c) Copyright 2012-2019 Enjoy-Digital
BIOS built on Mar 16 2020 09:58:40
BIOS CRC passed (74875417)
Migen git sha1: d11565a
LiteX git sha1: 02bfda5e
--=============== SoC ==================--
CPU: VexRiscv @ 100MHz
ROM: 32KB
SRAM: 32KB
L2: 8KB
MAIN-RAM: 262144KB
--========== Initialization ============--
Initializing SDRAM...
SDRAM now under software control
Read leveling:
m0, b0: |11111111110000000000000000000000| delays: 05+-05
m0, b1: |00000000000000111111111111000000| delays: 20+-06
m0, b2: |00000000000000000000000000000001| delays: 31+-00
m0, b3: |00000000000000000000000000000000| delays: -
m0, b4: |00000000000000000000000000000000| delays: -
m0, b5: |00000000000000000000000000000000| delays: -
m0, b6: |00000000000000000000000000000000| delays: -
m0, b7: |00000000000000000000000000000000| delays: -
best: m0, b1 delays: 20+-06
m1, b0: |11111111110000000000000000000000| delays: 05+-05
m1, b1: |00000000000000111111111111000000| delays: 20+-06
m1, b2: |00000000000000000000000000000001| delays: 31+-00
m1, b3: |00000000000000000000000000000000| delays: -
m1, b4: |00000000000000000000000000000000| delays: -
m1, b5: |00000000000000000000000000000000| delays: -
m1, b6: |00000000000000000000000000000000| delays: -
m1, b7: |00000000000000000000000000000000| delays: -
best: m1, b1 delays: 20+-06
SDRAM now under hardware control
Memtest OK
--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
[FLTERM] Received firmware download request from the device.
[FLTERM] Uploading kernel (26828 bytes)...
[FLTERM] Upload complete (7.7KB/s).
[FLTERM] Booting the device.
[FLTERM] Done.
Executing booted program at 0x40000000
--============= Liftoff! ===============--
*** Booting Zephyr OS build zephyr-v2.2.0-313-g4fd38265224d ***
Running test suite interrupt_feature
===================================================================
starting test - test_isr_dynamic
PASS - test_isr_dynamic
===================================================================
starting test - test_nested_isr
Triggering irq : 90
PASS - test_nested_isr
===================================================================
starting test - test_prevent_interruption
locking interrupts
We'll look into this.
@stephanosio I confirm that the test fails on the HW:
@mateusz-holenko Thanks for checking. I have created an issue to track this: #23622.
Describe the bug The
tests/kernel/interrupt
test (in particular,test_prevent_interruption
) hangs when running on a platform that uses a system timer that requires the timer interrupt to be serviced in order forz_timer_cycle_get_32
to return the correct value (i.e. "IRQ lock intolerant timer").These platforms include:
altera_avalon_timer_hal
)xlnx_psttc_timer
)litex_timer
)To Reproduce Build and run
tests/kernel/interrupt
for any of the aforementioned platforms.Expected behavior
test_prevent_interruption
successfully completes.Screenshots or console output
Environment (please complete the following information):
Analysis This happens because
test_prevent_interruption
callsk_busy_wait
, which usesz_timer_cycle_get_32
for time reference, after locking (disabling) interrupts.Since the
z_timer_cycle_get_32
implementation for the aforementioned platforms returns the value of an internal counter that is updated by the timer ISR, thek_busy_wait
function, which relies on that function, hangs when interrupt is locked.Proposed solution ~Use a wait mechanism that does not rely on the
z_timer_cycle_get_32
function in the test.~Since it is indeed expected of
z_timer_cycle_get_32
(and thereforek_busy_wait
) to work when IRQ is locked, we must fix the system timer drivers for the aforementioned platforms instead.