zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.8k stars 6.58k forks source link

rtio: Tests fail on qemu_xtensa_dc233c #78004

Closed teburd closed 1 month ago

teburd commented 1 month ago

Describe the bug A hardware TLB exception occurs when calling k_sem_take in the blocking completion consume call.

To Reproduce west build -p -b qemu_xtensa/dc233c/mmu tests/subsys/rtio/rtio_api -t run

Expected behavior Passing tests

Impact CI failure on unrelated PRs Failing main CI

Additional context Disabling the consume semaphore solves the issue (-DCONFIG_RTIO_CONSUME_SEM=n)

teburd commented 1 month ago

@dcpleung @nashif filing this so I can track my findings easier

teburd commented 1 month ago

Some notes...

  1. Running the test suite with the debugserver and setting a break point on rtio_cqe_consume_block (where it was throwing an exception)... it no longer causes the TLB exception.
  2. If I disable the consume semaphore it no longer causes an exception
  3. If I wait in rtio_submit() before getting the completion I no longer get the exception

I had also tried bisecting but the change it pointed at didn't make any sense.

I believe there's potentially a race here somewhere, specific to xtensa, hunting it down will be tricky. To not block CI I'd propose we disable qemu_xtensa_dc233c while I'm debugging or apply a work-around provided by #78008

teburd commented 1 month ago

Should be fixed with #78008 if not please reopen

teburd commented 1 month ago

Was not fixed with #78008 sadly, opened #78254

teburd commented 1 month ago

Closed with #78254, though its worth noting this isn't fixing the TLB exception, merely disabling the test

kartben commented 1 month ago

@teburd https://github.com/zephyrproject-rtos/zephyr/actions/runs/10831180196/job/30055880758

INFO    - /__w/zephyr/zephyr/twister-out/qemu_xtensa_dc233c_mmu/tests/subsys/rtio/rtio_api/rtio.api.userspace.submit_sem/handler.log
INFO    - 3030 test scenarios (1199 test instances) selected, 320 configurations skipped (0 by static filter, 320 at runtime).
INFO    - 877 of 1199 test configurations passed (99.77%), 2 failed, 0 errored, 320 skipped with 0 warnings in 3191.33 seconds
INFO    - In total 8543 test cases were executed, 2659 skipped on 835 out of total 835 platforms (100.00%)
INFO    - 830 test configurations executed on platforms, 49 test configurations were only built.
INFO    - Saving reports...
INFO    - Writing JSON report /__w/zephyr/zephyr/twister-out/twister.json
INFO    - Writing xunit report /__w/zephyr/zephyr/twister-out/twister.xml...
INFO    - Writing xunit report /__w/zephyr/zephyr/twister-out/twister_report.xml...
INFO    - -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
INFO    - The following issues were found (showing the top 10 items):
INFO    - 1) tests/subsys/rtio/rtio_api/rtio.api.userspace on qemu_xtensa/dc233c/mmu failed (unexpected eof)
INFO    - 2) tests/subsys/rtio/rtio_api/rtio.api.userspace.submit_sem on qemu_xtensa/dc233c/mmu failed (unexpected eof)
INFO    - 
INFO    - To rerun the tests, call twister using the following commandline:
INFO    - west twister -p <PLATFORM> -s <TEST ID>, for example:
INFO    - 
INFO    - west twister -p qemu_xtensa/dc233c/mmu -s tests/subsys/rtio/rtio_api/rtio.api.userspace.submit_sem
INFO    - or with west:
INFO    - west build -p -b qemu_xtensa/dc233c/mmu tests/subsys/rtio/rtio_api -T rtio.api.userspace.submit_sem
INFO    - -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
INFO    - Run completed

looks like the platform isn't really excluded?

kartben commented 1 month ago

shouldn't it be qemu_xtensa_dc233c_mmu instead of qemu_xtensa_dc233c?

teburd commented 1 month ago

shouldn't it be qemu_xtensa_dc233c_mmu instead of qemu_xtensa_dc233c?

I can’t tell anymore since the board v2 naming thing, I don’t know