Open iuliana-prodan opened 2 months ago
From the crash and after analyzing the disassembly I've realized the error is here:
acc = AE_SLAI64S(acc, 1); /* Convert to Q17.47 */
92455be4: 297582ac4402b11345f23f { ae_s32.l.i aed3, a2, -4; l32i a4, a5, 12; nop; ae_slai64s aed1, aed1, 1 }
By comparing the two disassembles from RI2023 (which is ok) and RJ2024 (which is NOK):
we can see that for RJ2024, we have a load instruction on slot1, from the four operations in parallel.
By looking in the HiFi4 User guide:
this seems to be correct.
Also, another difference is the addi
and l32i
instructions - these are in both cases but in different sets of 4 operations.
At first, I checked that all the variables are aligned based on the instructions restrictions and they seem to be (if we look at the VADDR or A2, A3, etc from crash dump).
I also wrote a simple function that does a store and load (trying to reproduce the order of instructions from where the crash happens), with the exact addresses:
diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index d1aa05f55..40edcbee7 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -46,6 +46,22 @@
/* 32 bit data, 32 bit coefficients and 32 bit state variables */
+uint32_t first_iteration(void)
+{
+ uint32_t *addr_r = (uint32_t *)0x92c0c220;
+ uint32_t value;
+
+ ae_int32 *addr_w = (ae_int32 *)0x92c0c200;
+ ae_int32x2 tmp = AE_ZERO32();
+
+ AE_S32_L_I (tmp, addr_w, -4);
+
+ value = *addr_r;
+
+ return value;
+}
+int cnt = 0x1;
+
int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
{
ae_int64 acc;
@@ -67,6 +83,11 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
int j;
int nseries = iir->biquads_in_series;
+ if (cnt == 0x1) {
+ first_iteration();
+ }
+ cnt++;
+
/* Bypass is set with number of biquads set to zero. */
if (!iir->biquads)
return x;
With this, the disassembly looks similar to the crash - meaning is doing a store (ae_s32.l.i
), next a load (l32i.n
), but it passes:
92446e24 <first_iteration>:
first_iteration():
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:50
{
92446e24: 004136 entry a1, 32
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:55
ae_int32x2 tmp = AE_ZERO32();
92446e27: fc48c400672e { l32r a2, 92429fc4 (92c0c200 <heapmem+0x7200>); ae_movi aed0, 0 }
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:57
AE_S32_L_I (tmp, addr_w, -4);
92446e2d: e002f4 ae_s32.l.i aed0, a2, -4
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:59
value = *addr_r;
92446e30: 8228 l32i.n a2, a2, 32
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:61
return value;
92446e32: f01d retw.n
@singalsu @andyross do you have any suggestions, comments, ideas? What else can I try? Thanks!
@singalsu @andyross do you have any suggestions, comments, ideas? What else can I try? Thanks!
@iuliana-prodan I would raise a ticket with Cadence here as code works with old compiler and not new compiler. This could mean we need a #ifdef or pragma or indeed if our code is wrong here.
Yep, good debugging work @iuliana-prodan ! Help from Cadence would be really useful with this.
@iuliana-prodan short term you could put an ifdef that does a less optimal flow here and enable the iffdef via Kconfig as a compiler Work Around ?
@iuliana-prodan short term you could put an ifdef that does a less optimal flow here and enable the iffdef via Kconfig as a compiler Work Around ?
I've raised a ticket to cadence "Case# 46820991: Sound Open Firmware code not working with latest XCC toolchain RJ2024.3" but no useful answer.
TBH I have no idea how to less optimize this code.
The problem could be reading from the input buffers here: https://github.com/thesofproject/sof/blob/main/src/math/iir_df1_hifi3.c#L90-L95. All the intrinsics require aligned addresses.
Same here https://github.com/thesofproject/sof/blob/main/src/math/iir_df1_hifi3.c#L106-L110.
@singalsu looks like we have some gaps in the APIs for aligning MIN(copy_size). @iuliana-prodan these APIs are meant to adapt to HiFi flavor, but this sounds like a bug if you see unaligned access.
Can you try something like
diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index 39be8e3a6..ba97605be 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -70,8 +70,8 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
return x;
/* Coefficients order in coef[] is {a2, a1, b2, b1, b0, shift, gain} */
- coefp = (ae_int32x2 *)&iir->coef[0];
- delayp = (ae_int32x2 *)&iir->delay[0];
+ coefp = __builtin_assume_aligned((ae_int32x2 *)&iir->coef[0], THE_HIFI_VERSION_DATASIZE);
+ delayp = __builtin_assume_aligned((ae_int32x2 *)&iir->delay[0], THE_HIFI_VERSION_DATASIZE);
for (j = 0; j < iir->biquads; j += nseries) {
/* the first for loop is for parallel EQs, and they have the same input */
in = x;
'''
@singalsu looks like we have some gaps in the APIs for aligning MIN(copy_size). @iuliana-prodan these APIs are meant to adapt to HiFi flavor, but this sounds like a bug if you see unaligned access.
Can you try something like
diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c index 39be8e3a6..ba97605be 100644 --- a/src/math/iir_df1_hifi3.c +++ b/src/math/iir_df1_hifi3.c @@ -70,8 +70,8 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x) return x; /* Coefficients order in coef[] is {a2, a1, b2, b1, b0, shift, gain} */ - coefp = (ae_int32x2 *)&iir->coef[0]; - delayp = (ae_int32x2 *)&iir->delay[0]; + coefp = __builtin_assume_aligned((ae_int32x2 *)&iir->coef[0], THE_HIFI_VERSION_DATASIZE); + delayp = __builtin_assume_aligned((ae_int32x2 *)&iir->delay[0], THE_HIFI_VERSION_DATASIZE); for (j = 0; j < iir->biquads; j += nseries) { /* the first for loop is for parallel EQs, and they have the same input */ in = x; '''
@lgirdwood sorry for the late reply, I was OOO.
I've tried your suggestion and it still doesn't work. I get the same error, at the exact address:
[00:00:02.076,000] <err> os: ** FATAL EXCEPTION
[00:00:02.076,000] <err> os: ** CPU 0 EXCCAUSE 3 (load/store error)
[00:00:02.076,000] <err> os: ** PC 0x92455be4 VADDR 0x92c0c1fc
[00:00:02.076,000] <err> os: ** PS 0x60720
[00:00:02.076,000] <err> os: ** (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:02.076,000] <err> os: ** A0 0x924467d4 SP 0x92461b30 A2 0x92c0c200 A3 0x92c0d274
[00:00:02.076,000] <err> os: ** A4 (nil) A5 0x92c0c1f0 A6 0x92c0c080 A7 0x2
[00:00:02.076,000] <err> os: ** A8 (nil) A9 0x1 A10 (nil) A11 0x92c0c020
[00:00:02.076,000] <err> os: ** A12 0x9244668c A13 0x92c0c040 A14 (nil) A15 0x92c0c080
[00:00:02.076,000] <err> os: ** LBEG 0x92455ba0 LEND 0x92455c06 LCOUNT (nil)
[00:00:02.076,000] <err> os: ** SAR 0x1d
[00:00:02.076,000] <err> os: ** THREADPTR 0x2
[00:00:02.076,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.076,000] <err> os: Current thread: 0x92c06f10 (unknown)
[00:00:02.175,000] <err> zephyr: Halting system
The generated disassembly code is also the same.
The generated disassembly code is also the same.
ok - I would expect this compiler directive to be honored, definitely worth getting input from cadence on their compiler.
Describe the bug Compiled SOF successfully with XCC toolchain RJ2024.3 from Cadence. When trying to run an IIR test I get a
load/store error
on DSP:In Linux I get an
Input/output error
:To Reproduce Use an IIR topology and run the following commands:
Reproduction Rate With latest SOF - main branch, with XCC RJ2024.3, the error is met 100%. This was reproduced on all i.MX platforms: i.MX8MP, i.MX8QM, i.MX8QXP.
With XCC RI2023.11 everything works fine.
Expected behavior No error should appear.
Impact Not working as expected. Showstopper for IIR, if updating to latest XCC toolchain.
Environment 1) Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
Screenshots or console output
I've attached here the disassembly for OK XCC 2023 SOF and NOT OK XCC 2024 SOF. zephyr_2023_OK.dst.txt zephyr_2024_NOK.dst.txt