thesofproject / sof

Sound Open Firmware
Other
560 stars 318 forks source link

[BUG] [IIR] IIR not working with SOF compiled with XCC RJ2024.3 #9350

Open iuliana-prodan opened 2 months ago

iuliana-prodan commented 2 months ago

Describe the bug Compiled SOF successfully with XCC toolchain RJ2024.3 from Cadence. When trying to run an IIR test I get a load/store error on DSP:

*** Booting Zephyr OS build v3.7.0-rc2-451-g740d7f735e23 ***
[00:00:00.000,000] <inf> main: SOF on imx8mp_evk
[00:00:00.000,000] <inf> main: SOF initialized
[00:00:06.788,000] <err> os:  ** FATAL EXCEPTION
[00:00:06.788,000] <err> os:  ** CPU 0 EXCCAUSE 3 (load/store error)
[00:00:06.788,000] <err> os:  **  PC 0x92455be4 VADDR 0x92c0c1fc
[00:00:06.788,000] <err> os:  **  PS 0x60720
[00:00:06.788,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:06.788,000] <err> os:  **  A0 0x924467d4  SP 0x92461b30  A2 0x92c0c200  A3 0x92c0d274
[00:00:06.788,000] <err> os:  **  A4 (nil)  A5 0x92c0c1f0  A6 0x92c0c080  A7 0x2
[00:00:06.788,000] <err> os:  **  A8 (nil)  A9 0x1 A10 (nil) A11 0x92c0c020
[00:00:06.788,000] <err> os:  ** A12 0x9244668c A13 0x92c0c040 A14 (nil) A15 0x92c0c080
[00:00:06.788,000] <err> os:  ** LBEG 0x92455ba0 LEND 0x92455c06 LCOUNT (nil)
[00:00:06.788,000] <err> os:  ** SAR 0x1d
[00:00:06.788,000] <err> os:  **  THREADPTR 0x2
[00:00:06.788,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:06.788,000] <err> os: Current thread: 0x92c06f10 (unknown)
[00:00:06.886,000] <err> zephyr: Halting system

In Linux I get an Input/output error:

root@imx8mpevk:~# aplay -Dhw:0,0 -f S16_LE -c 2 -r 48000 10s_sample.wav &
[1] 784
root@imx8mpevk:~# sleep 2
Playing WAVE '10s_sample.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo

root@imx8mpevk:~# /unit_tests/sof/tools/ctl/sof-ctl -Dhw:0 -n 46 -s /unit_tests/sof/tools/ctl/eq_iir_bandpass.txt
Control size is 1024.
Applying configuration "/unit_tests/sof/tools/ctl/eq_iir_bandpass.txt" into device hw:0 control numid=46.
4607827,0,116,50331648,0,0,0,0,116,2,1,0,0,0,0,0,0,2,2,0,0,0,0,3316150158,2048164275,513807534,3267352229,513807534,0,16384,3867454526,1191025347,38870735,77741469,38870735,4294967292,24197
Success.
hdr: magic 0x00464f53
hdr: type 0
hdr: size 116 bytes
hdr: abi 3:0:0
4607827,0,116,50331648,0,0,0,0,116,2,1,0,0,0,0,0,0,2,2,0,0,0,0,3316150158,2048164275,513807534,3267352229,513807534,0,16384,3867454526,1191025347,38870735,77741469,38870735,4294967292,24197
root@imx8mpevk:~# 
root@imx8mpevk:~# aplay: pcm_write:2178: write error: Input/output error

[1]+  Done(1)                 aplay -Dhw:0,0 -f S16_LE -c 2 -r 48000 10s_sample.wav
root@imx8mpevk:~#

To Reproduce Use an IIR topology and run the following commands:

$ aplay -Dhw:0,0 -f S16_LE -c 2 -r 48000 10s_sample.wav &
$ sleep 2
$ sof-ctl -Dhw:0 -n 46 -s eq_iir_loudness.txt

Reproduction Rate With latest SOF - main branch, with XCC RJ2024.3, the error is met 100%. This was reproduced on all i.MX platforms: i.MX8MP, i.MX8QM, i.MX8QXP.

With XCC RI2023.11 everything works fine.

Expected behavior No error should appear.

Impact Not working as expected. Showstopper for IIR, if updating to latest XCC toolchain.

Environment 1) Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).

Screenshots or console output

[00:00:02.074,000] <err> os:  ** FATAL EXCEPTION
[00:00:02.074,000] <err> os:  ** CPU 0 EXCCAUSE 3 (load/store error)
[00:00:02.074,000] <err> os:  **  PC 0x92455be4 VADDR 0x92c0c1fc
[00:00:02.074,000] <err> os:  **  PS 0x60720
[00:00:02.074,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:02.074,000] <err> os:  **  A0 0x924467d4  SP 0x92461b30  A2 0x92c0c200  A3 0x92c0d274
[00:00:02.074,000] <err> os:  **  A4 (nil)  A5 0x92c0c1f0  A6 0x92c0c080  A7 0x2
[00:00:02.074,000] <err> os:  **  A8 (nil)  A9 0x1 A10 (nil) A11 0x92c0c020
[00:00:02.074,000] <err> os:  ** A12 0x9244668c A13 0x92c0c040 A14 (nil) A15 0x92c0c080
[00:00:02.074,000] <err> os:  ** LBEG 0x92455ba0 LEND 0x92455c06 LCOUNT (nil)
[00:00:02.074,000] <err> os:  ** SAR 0x1d
[00:00:02.074,000] <err> os:  **  THREADPTR 0x2
[00:00:02.074,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.074,000] <err> os: Current thread: 0x92c06f10 (unknown)
[00:00:02.173,000] <err> zephyr: Halting system

I've attached here the disassembly for OK XCC 2023 SOF and NOT OK XCC 2024 SOF. zephyr_2023_OK.dst.txt zephyr_2024_NOK.dst.txt

iuliana-prodan commented 2 months ago

From the crash and after analyzing the disassembly I've realized the error is here:

            acc = AE_SLAI64S(acc, 1); /* Convert to Q17.47 */
92455be4:   297582ac4402b11345f23f  { ae_s32.l.i    aed3, a2, -4; l32i  a4, a5, 12; nop; ae_slai64s aed1, aed1, 1 }

By comparing the two disassembles from RI2023 (which is ok) and RJ2024 (which is NOK): image we can see that for RJ2024, we have a load instruction on slot1, from the four operations in parallel. By looking in the HiFi4 User guide: image this seems to be correct. Also, another difference is the addi and l32i instructions - these are in both cases but in different sets of 4 operations.

At first, I checked that all the variables are aligned based on the instructions restrictions and they seem to be (if we look at the VADDR or A2, A3, etc from crash dump).

I also wrote a simple function that does a store and load (trying to reproduce the order of instructions from where the crash happens), with the exact addresses:

diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index d1aa05f55..40edcbee7 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -46,6 +46,22 @@

 /* 32 bit data, 32 bit coefficients and 32 bit state variables */

+uint32_t first_iteration(void)
+{
+       uint32_t *addr_r = (uint32_t *)0x92c0c220;
+       uint32_t value;
+
+       ae_int32 *addr_w = (ae_int32 *)0x92c0c200;
+       ae_int32x2 tmp = AE_ZERO32();
+
+       AE_S32_L_I (tmp, addr_w, -4);
+
+       value = *addr_r;
+
+       return value;
+}
+int cnt = 0x1;
+
 int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
 {
        ae_int64 acc;
@@ -67,6 +83,11 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
        int j;
        int nseries = iir->biquads_in_series;

+       if (cnt == 0x1) {
+               first_iteration();
+       }
+       cnt++;
+
        /* Bypass is set with number of biquads set to zero. */
        if (!iir->biquads)
                return x;

With this, the disassembly looks similar to the crash - meaning is doing a store (ae_s32.l.i), next a load (l32i.n), but it passes:

92446e24 <first_iteration>:
first_iteration():
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:50
{
92446e24:   004136                          entry   a1, 32
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:55
    ae_int32x2 tmp = AE_ZERO32();
92446e27:   fc48c400672e                { l32r  a2, 92429fc4 (92c0c200 <heapmem+0x7200>); ae_movi   aed0, 0 }
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:57
    AE_S32_L_I (tmp, addr_w, -4);
92446e2d:   e002f4                          ae_s32.l.i  aed0, a2, -4
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:59
    value = *addr_r;
92446e30:   8228                            l32i.n  a2, a2, 32
/opt/samba/nxa06898/xcc/sof/src/math/iir_df1_hifi3.c:61
    return value;
92446e32:   f01d                            retw.n
iuliana-prodan commented 2 months ago

@singalsu @andyross do you have any suggestions, comments, ideas? What else can I try? Thanks!

lgirdwood commented 2 months ago

@singalsu @andyross do you have any suggestions, comments, ideas? What else can I try? Thanks!

@iuliana-prodan I would raise a ticket with Cadence here as code works with old compiler and not new compiler. This could mean we need a #ifdef or pragma or indeed if our code is wrong here.

singalsu commented 2 months ago

Yep, good debugging work @iuliana-prodan ! Help from Cadence would be really useful with this.

lgirdwood commented 2 months ago

@iuliana-prodan short term you could put an ifdef that does a less optimal flow here and enable the iffdef via Kconfig as a compiler Work Around ?

iuliana-prodan commented 2 months ago

@iuliana-prodan short term you could put an ifdef that does a less optimal flow here and enable the iffdef via Kconfig as a compiler Work Around ?

I've raised a ticket to cadence "Case# 46820991: Sound Open Firmware code not working with latest XCC toolchain RJ2024.3" but no useful answer.

TBH I have no idea how to less optimize this code.

The problem could be reading from the input buffers here: https://github.com/thesofproject/sof/blob/main/src/math/iir_df1_hifi3.c#L90-L95. All the intrinsics require aligned addresses.

Same here https://github.com/thesofproject/sof/blob/main/src/math/iir_df1_hifi3.c#L106-L110.

lgirdwood commented 2 months ago

@singalsu looks like we have some gaps in the APIs for aligning MIN(copy_size). @iuliana-prodan these APIs are meant to adapt to HiFi flavor, but this sounds like a bug if you see unaligned access.

Can you try something like


diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index 39be8e3a6..ba97605be 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -70,8 +70,8 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
                return x;

        /* Coefficients order in coef[] is {a2, a1, b2, b1, b0, shift, gain} */
-       coefp = (ae_int32x2 *)&iir->coef[0];
-       delayp = (ae_int32x2 *)&iir->delay[0];
+       coefp = __builtin_assume_aligned((ae_int32x2 *)&iir->coef[0], THE_HIFI_VERSION_DATASIZE);
+       delayp = __builtin_assume_aligned((ae_int32x2 *)&iir->delay[0], THE_HIFI_VERSION_DATASIZE);
        for (j = 0; j < iir->biquads; j += nseries) {
                /* the first for loop is for parallel EQs, and they have the same input */
                in = x;
'''
iuliana-prodan commented 2 months ago

@singalsu looks like we have some gaps in the APIs for aligning MIN(copy_size). @iuliana-prodan these APIs are meant to adapt to HiFi flavor, but this sounds like a bug if you see unaligned access.

Can you try something like

diff --git a/src/math/iir_df1_hifi3.c b/src/math/iir_df1_hifi3.c
index 39be8e3a6..ba97605be 100644
--- a/src/math/iir_df1_hifi3.c
+++ b/src/math/iir_df1_hifi3.c
@@ -70,8 +70,8 @@ int32_t iir_df1(struct iir_state_df1 *iir, int32_t x)
                return x;

        /* Coefficients order in coef[] is {a2, a1, b2, b1, b0, shift, gain} */
-       coefp = (ae_int32x2 *)&iir->coef[0];
-       delayp = (ae_int32x2 *)&iir->delay[0];
+       coefp = __builtin_assume_aligned((ae_int32x2 *)&iir->coef[0], THE_HIFI_VERSION_DATASIZE);
+       delayp = __builtin_assume_aligned((ae_int32x2 *)&iir->delay[0], THE_HIFI_VERSION_DATASIZE);
        for (j = 0; j < iir->biquads; j += nseries) {
                /* the first for loop is for parallel EQs, and they have the same input */
                in = x;
'''

@lgirdwood sorry for the late reply, I was OOO.

I've tried your suggestion and it still doesn't work. I get the same error, at the exact address:

[00:00:02.076,000] <err> os:  ** FATAL EXCEPTION
[00:00:02.076,000] <err> os:  ** CPU 0 EXCCAUSE 3 (load/store error)
[00:00:02.076,000] <err> os:  **  PC 0x92455be4 VADDR 0x92c0c1fc
[00:00:02.076,000] <err> os:  **  PS 0x60720
[00:00:02.076,000] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[00:00:02.076,000] <err> os:  **  A0 0x924467d4  SP 0x92461b30  A2 0x92c0c200  A3 0x92c0d274
[00:00:02.076,000] <err> os:  **  A4 (nil)  A5 0x92c0c1f0  A6 0x92c0c080  A7 0x2
[00:00:02.076,000] <err> os:  **  A8 (nil)  A9 0x1 A10 (nil) A11 0x92c0c020
[00:00:02.076,000] <err> os:  ** A12 0x9244668c A13 0x92c0c040 A14 (nil) A15 0x92c0c080
[00:00:02.076,000] <err> os:  ** LBEG 0x92455ba0 LEND 0x92455c06 LCOUNT (nil)
[00:00:02.076,000] <err> os:  ** SAR 0x1d
[00:00:02.076,000] <err> os:  **  THREADPTR 0x2
[00:00:02.076,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.076,000] <err> os: Current thread: 0x92c06f10 (unknown)
[00:00:02.175,000] <err> zephyr: Halting system

The generated disassembly code is also the same.

lgirdwood commented 2 months ago

The generated disassembly code is also the same.

ok - I would expect this compiler directive to be honored, definitely worth getting input from cadence on their compiler.