zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.43k stars 6.4k forks source link

intel_adsp_cavs25: tests/boards/intel_adsp/ssp/boards.intel_adsp.ssp fails. #61789

Open gbarkadiusz opened 1 year ago

gbarkadiusz commented 1 year ago

Describe the bug The tests/boards/intel_adsp/ssp/boards.intel_adsp.ssp fails.


2023-08-23 13:41:00,066 - twister - DEBUG - run status: intel_adsp_cavs25/tests/boards/intel_adsp/ssp/boards.intel_adsp.ssp failed
2023-08-23 13:41:00,072 - twister - INFO - 1/1 intel_adsp_cavs25         tests/boards/intel_adsp/ssp/boards.intel_adsp.ssp   FAILED Timeout (device 60.326s)

To Reproduce scripts/twister -W -p intel_adsp_cavs25 --device-testing --hardware-map <path_to>/cavs/cavs.map -s tests/boards/intel_adsp/ssp/boards.intel_adsp.sspf -v

Logs and console output

Handler.log contains.

INFO:cavs-client:REMOTE HOST: <IP_HOST> PORT: <PORT>
INFO:cavs-client:Monitor process
INFO:cavs-client:Sent:     start_log
INFO:cavs-client:Receive: start_log
INFO:cavs-client:Start to monitor log output...
--
I: intel_adsp_gpdma_power_on: dma dma@7d000 initialized
I: intel_adsp_gpdma_power_on: dma dma@7c000 initialized
*** Booting Zephyr OS build zephyr-v3.4.0-2131-ga3f6142e9961 ***
Running TESTSUITE adsp_ssp
===================================================================
I: dai_ssp_pm_runtime_en_ssp_power en_ssp_power index 0
E: dai_ssp_poll_for_register_delay poll timeout reg 465924 mask 256 val 0 us 125
W: dai_ssp_pm_runtime_en_ssp_power warning: timeout
I: dai_ssp_pm_runtime_en_ssp_power I2SLCTL
I: dai_ssp_update_bits base 77000, reg 8, mask 80, value 80
START - test_adsp_ssp_config_set
I: dai_ssp_log_ssp_data dai index: 0
I: dai_ssp_log_ssp_data plat_data base: 487424
I: dai_ssp_log_ssp_data plat_data irq: 0
I: dai_ssp_log_ssp_data plat_data fifo playback offset: 487440
I: dai_ssp_log_ssp_data plat_data fifo playback handshake: 2
I: dai_ssp_log_ssp_data plat_data fifo capture offset: 487440
I: dai_ssp_log_ssp_data plat_data fifo capture handshake: 3
I: dai_ssp_set_config_tplg
I: dai_ssp_set_config_tplg going for loopback!
I: dai_ssp_set_config_tplg sscr0 = 0xc1d0003f, sscr1 = 0xd0400004, ssto = 0x00000000, sspsp = 0x2200000
I: dai_ssp_set_config_tplg sscr2 = 0x00004002, sspsp2 = 0x00000000, sscr3 = 0x07070000, ssioc = 0x00000020
I: dai_ssp_set_config_tplg ssrsa = 0x00000003, sstsa = 0x00000003
PASS - test_adsp_ssp_config_set in 0.002 seconds
===================================================================
START - test_adsp_ssp_transfer
I: dai_ssp_get_properties dai_index 0
I: dai_ssp_get_properties fifo 487440
I: dai_ssp_get_properties handshake 2
I: dai_ssp_get_properties init delay 0
dma block 0 block_size 48, source addr 9e025dd8, dest addr 77010
set next block pointer to 0x9e02c6f0
dma block 1 block_size 48, source addr 9e025dd8, dest addr 77010
Configuring the dma tx transfer on channel 0
I: dai_ssp_get_properties dai_index 0
I: dai_ssp_get_properties fifo 487440
I: dai_ssp_get_properties handshake 3
I: dai_ssp_get_properties init delay 0
dma block 0 block_size 48, source addr 77010, dest addr 9e02c000
set next block pointer to 0x9e02c694
dma block 1 block_size 48, source addr 77010, dest addr 9e02c0c0
Configuring the dma rx transfer on channel 1
Starting the transfer on channels 0 and 1 and waiting completion
I: dai_ssp_pre_start
I: dai_ssp_mn_set_mclk mclk_rate 24576000, mclk_source_clock 0
I: dai_ssp_set_mclk_divider mclk_id 0 mdivr_val 1
I: dai_ssp_find_mn for freq 24576000 bclk 3072000
I: dai_ssp_mn_set_bclk bclk_rate 3072000, *out_scr_div 8, m 1, n 1
I: dai_ssp_bclk_prepare_enable sscr0 = 0xc1d0077f
I: dai_ssp_update_bits base 77000, reg 4, mask 300000, value 300000
I: dai_ssp_update_bits base 77000, reg 0, mask 80, value 80
I: dai_ssp_early_start SSE set for SSP0
I: dai_ssp_pre_start
I: dai_ssp_update_bits base 77000, reg 4, mask 300000, value 300000
I: dai_ssp_update_bits base 77000, reg 0, mask 80, value 80
I: dai_ssp_early_start SSE set for SSP0
I: dai_ssp_start
I: dai_ssp_update_bits base 77000, reg 34, mask 100, value 100
I: dai_ssp_start
I: dai_ssp_update_bits base 77000, reg 30, mask 100, value 100
rx giving xfer_sem
E:  ** FATAL EXCEPTION
E:  ** CPU 0 EXCCAUSE 9 (load/store alignment)
E:  **  PC 0xbe01b742 VADDR 0x9e027b81
E:  **  PS 0x60a20
E:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:10 CALLINC:2)
E:  **  A0 0xbe01b774  SP 0xbe0289b0  A2 0x9e027b78  A3 0x9e02c180
E:  **  A4 0x7e014458  A5 0xbe028ac0  A6 0x60025  A7 0xbe02a670
E:  **  A8 (nil)  A9 0x9e027b7d A10 0x9e027b78 A11 0x60a25
E:  ** A12 (nil) A13 0x1 A14 (nil) A15 (nil)
E:  ** LBEG (nil) LEND (nil) LCOUNT (nil)
E:  ** SAR 0x1f
Backtrace:0xbe01b73f:0xbe0289b0 0xbe01b771:0xbe0289d0 0xbe01b8d4:0xbe0289f0 0xbe01ad18:0xbe028a10 0xbe011fa1:0xbe028a30 0xbe0192d9:0xbe028a50 0xbe0155d5:0xbe028a80 0xbe0147dd:0xbe028aa0 0x7e01446b:0xbe028ac0  |<-CORRUPTED
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
E: Current thread: 0x9e02c220 (idle 00)
E: Halting system

Environment (please complete the following information):

nashif commented 10 months ago

@kv2019i any idea what is going on here?

github-actions[bot] commented 8 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

gbarkadiusz commented 8 months ago

This is still reproduced.

github-actions[bot] commented 6 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

pkunieck commented 5 months ago

We still got reproduction

kv2019i commented 4 months ago

@nashif I can reproduce this locally. This seems to be a test case problem. The fault happens in sys_clock_announce() and I cna't really spot anything wrong in the driver. The test case will need a fix though.

kv2019i commented 4 months ago

I can make the test pass if I remove the sem_take and sem_give and add a simple sleep, so there's something wrong with the semaphore use in this test:

pattern not found in rx buffer
Sleep before cleanup
 PASS - test_adsp_ssp_transfer in 2.009 seconds
===================================================================
TESTSUITE adsp_ssp succeeded
kv2019i commented 4 months ago

I can also make the test pass with semaphores in place by adding a delay (see draft PR #71452 ). This doesn't make much sense, need to continue debug.

github-actions[bot] commented 2 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

SzymonRichert commented 2 months ago

@kv2019i any progress with this one ?

nashif commented 1 month ago

@ceolin after applying this workaround in https://github.com/zephyrproject-rtos/zephyr/pull/71452 power management kicks in and locks the system, i.e. no further tests are possible, but we need to see why this delay is needed in the first place.

ceolin commented 1 month ago

I am going to check. Maybe secondary cores are not online and this sempahore is causing the system to deep sleep.

SzymonRichert commented 2 weeks ago

@ceolin any update regarding this one ?

ceolin commented 1 week ago

That is weird, cav25 has only one power state and it should be triggered by this sleep because of its residency time.

ceolin commented 1 week ago

https://github.com/zephyrproject-rtos/zephyr/pull/77581 -> This fix the problem of the system sleeps indefinitely.

ceolin commented 1 week ago

@kv2019i @nashif there is something really weird happening in this test and there is a chance the problem is in the kernel. What the sleep does is allowing the dma transfer to happen before the thread trying to acquire the semaphore. The callback is called and does k_sem_give() before the test does k_sem_take() so there is not contention needed.

To make things even more weird, if we run it with only one core enabled (CONFIG_MP_MAX_NUM_CPUS=1) the DMA transfer does NOT finish but the test ignores it and pass.