thesofproject / sof

Sound Open Firmware
Other
561 stars 318 forks source link

[BUG] DSP writes to low addresses like 0x10 don't cause exceptions #6093

Open lyakh opened 2 years ago

lyakh commented 2 years ago

Describe the bug A line of code clearly has to dereference a pointer, where the above line prints it as NULL, and no exception is generated, when built with xcc.

To Reproduce Use https://github.com/thesofproject/sof/commit/95b128e4f614acf844c1fd9388232a44f8255ae6 and build an XTOS based firmware for TGL, then run it on an SDW platform. Observe the following log:

[      223577.473958] (           6.666667) c0 ipc                  src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x30130000
[      223598.463542] (          20.989584) c0 pipe         1.3   ......./pipeline-graph.c:307  INFO pipeline complete, clock freq 400000000Hz
[      223701.380208] (         102.916664) c0 ipc                  src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x60010000
[      223714.296875] (          12.916667) c0 pipe         30.53 ....../pipeline-params.c:246  INFO pipe params dir 0 frame_fmt 0 buffer_fmt 0 rate 48000
[      223723.619792] (           9.322917) c0 pipe         30.53 ....../pipeline-params.c:250  INFO pipe params stream_tag 1 channels 2 sample_valid_bytes 2 sample_container_bytes 2
[      223844.973958] (         121.354164) c0 dai          1.2         src/ipc/ipc3/dai.c:109  INFO dai_data_config() dai type = 4 index = 2 dd 0x9e11e400
[      223855.052083] (          10.078125) c0 dai          1.2         src/ipc/ipc3/dai.c:142  INFO DMA buffer (nil)
[      223881.015625] (          25.963541) c0 dai          1.2     src/audio/dai-legacy.c:362  INFO dai_playback_params() dest_dev = 22 stream_id = 7 src_width = 4 dest_width = 4
[      223891.197917] (          10.182292) c0 dai          1.2     src/audio/dai-legacy.c:368  INFO dai_playback_params() fifo 0x7141c
[      223906.848958] (          15.651042) c0 pipe         30.53 ....../pipeline-params.c:343  INFO pipe prepare
[      223982.656250] (          75.807289) c0 dai          1.2     src/audio/dai-legacy.c:655  INFO dai_prepare()
[      223992.734375] (          10.078125) c0 dai          1.2     src/audio/dai-legacy.c:621  INFO dai_config_prepare(), channel = 7
[      224001.875000] (           9.140625) c0 dw-dma                 src/drivers/dw/dma.c:194  INFO dw_dma_channel_get(): dma 0 request channel 7
[      224011.979167] (          10.104167) c0 dai          1.2     src/audio/dai-legacy.c:640  INFO dai_config_prepare(): new configured dma channel index 0

specifically see this line: [ 223855.052083] ( 10.078125) c0 dai 1.2 src/ipc/ipc3/dai.c:142 INFO DMA buffer (nil)

which is produced by this code: https://github.com/thesofproject/sof/blob/95b128e4f614acf844c1fd9388232a44f8255ae6/src/ipc/ipc3/dai.c#L142-L143 where the next line after the print dereferences the NULL pointer, and no exception happens. A full log is attached.

Reproduction Rate 100%

Expected behavior A DSP exception should be triggered.

Impact The compiler seems to have a severe bug

Environment 1) Name of the topology file

sof-coherent-null.txt

lyakh commented 2 years ago

I modified the code a bit:

        case SOF_DAI_INTEL_ALH:
                /* SDW HW FIFO always requires 32bit MSB aligned sample data for                                                                                                                                                            
                 * all formats, such as 8/16/24/32 bits.                                                                                                                                                                                    
                 */
                dev->ipc_config.frame_fmt = SOF_IPC_FRAME_S32_LE;
                t1 = platform_timer_get(NULL);
                dd->dma_buffer->stream.frame_fmt = dev->ipc_config.frame_fmt;
                t2 = platform_timer_get(NULL);
                comp_info(dev, "DMA buffer %p t-diff %u", dd->dma_buffer, (unsigned)(t2 - t1));

to make the location between the two calls to platform_timer_get() obvious. Here's the disassembly:

be030fd4:       1062d2                  s32i    a13, a2, 64
be030fd7:       1bebe5                  call8   be04ce94 <platform_timer_get>
be030fda:       1123e2                  l32i    a14, a3, 68
be030fdd:       207aa0                  or      a7, a10, a10
be030fe0:       1022d2                  l32i    a13, a2, 64
be030fe3:       0a0c                    movi.n  a10, 0
be030fe5:       176ed2                  s32i    a13, a14, 92
be030fe8:       1beaa5                  call8   be04ce94 <platform_timer_get>
be030feb:       04dd                    mov.n   a13, a4
be030fed:       ffaec1                  l32r    a12, be030ea8 (20002528 <log_entry$20773_360_14>)
be030ff0:       3e0c                    movi.n  a14, 3
be030ff2:       c2f8                    l32i.n  a15, a2, 48
be030ff4:       c08a70                  sub     a8, a10, a7
be030ff7:       b2b8                    l32i.n  a11, a2, 44
be030ff9:       2a0c                    movi.n  a10, 2
be030ffb:       01b9                    s32i.n  a11, a1, 0
be030ffd:       11a9                    s32i.n  a10, a1, 4
be030fff:       0b0c                    movi.n  a11, 0
be031001:       06ad                    mov.n   a10, a6
be031003:       112392                  l32i    a9, a3, 68
be031006:       2199                    s32i.n  a9, a1, 8
be031008:       3189                    s32i.n  a8, a1, 12
be03100a:       2454a5                  call8   be055554 <_log_sofdict>
marc-hb commented 2 years ago

Was this discovered while debugging recent regressions #6075 and/or #6091? Just trying to connect some dots, sorry if unrelated.

lyakh commented 2 years ago

Was this discovered while debugging recent regressions #6075 and/or #6091? Just trying to connect some dots, sorry if unrelated.

@marc-hb #6075 , yes

lyakh commented 2 years ago

Further, the same problem also appears on TGL, where the code was modified to add that NULL-dereference, also when built with crosstool gcc and with the Zephyr toolchain. Apparently that code should run during initial DAI configuration and during potential run-time DAI re-configuration. The ipc_dai_data_config() function is only called from dai_params() in dai-legacy.c and dai-zephyr.c, and in both cases dd->dma_buffer is first allocated there - during initialisation, or re-used - during re-configuration.

lyakh commented 2 years ago

Update: just checked, that doing the same - adding a low address (e.g. 0x10) dereference in main() behaves in one of two ways: (1) causes an exception when reading from it - as expected or (2) causes no problem when writing to it.

lgirdwood commented 2 years ago

@lyakh what happens when 1) your pointer is volatile ? OR 2) You write back the cache after the write ?

lyakh commented 2 years ago

Update Nr.2: on BYT neither writes to nor reads from low addresses cause exceptions. Update Nr.3: actually using 0 instead of a low address has the same effect

lyakh commented 2 years ago

@lyakh what happens when

1. your pointer is volatile ?
   OR

2. You write back the cache after the write ?

@lgirdwood sorry, I checked assembly. I'm rather sure those actual addresses (0x10, 0, etc.) are used, which of course are outside of any RAM addresses - cached or uncached.

lyakh commented 2 years ago

@mmaka1 any idea about this?

lgirdwood commented 2 years ago

@lyakh what happens when

1. your pointer is volatile ?
   OR

2. You write back the cache after the write ?

@lgirdwood sorry, I checked assembly. I'm rather sure those actual addresses (0x10, 0, etc.) are used, which of course are outside of any RAM addresses - cached or uncached.

Can you check anyway, this could be misconfiguration of the cache_attr for the HW (see the linker script for value and pls check it's value is aligned with HW).

lyakh commented 2 years ago

Can you check anyway, this could be misconfiguration of the cache_attr for the HW (see the linker script for value and pls check it's value is aligned with HW).

@lgirdwood I see what you mean... IIRC @andyross and @ceolin checked and double-checked those flags for Zephyr, do you think such an effect could be possible via their misconfiguration?

andyross commented 2 years ago

Unfortunately the cAVS/MTL platforms put registers in that low 512M block, so we can't use the region protection bits to deny access. That region is marked rw/uncached. Whether or not an address faults (some do, for sure) is entirely up to the bus hardware.

Once there's hardware available with an MMU, there will be more options. Also on some of the devices (I want to say cAVS 1.5 can do this but 1.8+ can't?) we have the "RPO with Translation" feature, which allows the top 3 bits of memory to remap each 512M region. That would allow us (with a lot of work to the devicetree definitions!) to move the registers somewhere else and mark the null region invalid. But it would only work on the older hardware, IIRC.

andyross commented 2 years ago

Just checked: indeed, the core-isa.h files in the sof tree have XCHAL_HAVE_XLT_CACHEATTR==1 for BYT and APL, but all other platforms (AMD/NXP/MediaTek included) are 0. So that's not really much of an option.

andyross commented 2 years ago

Can you check anyway, this could be misconfiguration of the cache_attr for the HW

FWIW: the cacheattr register actually doesn't exist on platforms with Region Protection Option, the HAL code to set it actually interprets the register value at runtime. We have a smaller, plausibly-too-optimized-for-its-own-good version here: https://github.com/zephyrproject-rtos/zephyr/blob/main/include/zephyr/arch/xtensa/cache.h#L163

ceolin commented 2 years ago

Just for curiosity, can you check what is the access mode for this address ? Not familiar with this part of SoF code, but I think glb_addr_attr should give you it.

ceolin commented 2 years ago

Unfortunately the cAVS/MTL platforms put registers in that low 512M block, so we can't use the region protection bits to deny access. That region is marked rw/uncached. Whether or not an address faults (some do, for sure) is entirely up to the bus hardware.

Once there's hardware available with an MMU, there will be more options. Also on some of the devices (I want to say cAVS 1.5 can do this but 1.8+ can't?) we have the "RPO with Translation" feature, which allows the top 3 bits of memory to remap each 512M region. That would allow us (with a lot of work to the devicetree definitions!) to move the registers somewhere else and mark the null region invalid. But it would only work on the older hardware, IIRC.

We are using identity map without translation, aren't we ?

andyross commented 2 years ago

Right. The code is aware of the "with translation" distinction (because it has to set up the top three bits differently), but for compatibility we don't try to do any mapping.

ceolin commented 2 years ago

I may be missing something, but only baytrail and apollolake have support for translation.

src/platform/apollolake/include/arch/xtensa/config/core-isa.h:580:#define XCHAL_HAVE_XLT_CACHEATTR  1   /* region prot. w/translation */
src/platform/baytrail/include/arch/xtensa/config/core-isa.h:478:#define XCHAL_HAVE_XLT_CACHEATTR    1   /* region prot. w/translation */
src/platform/cannonlake/include/arch/xtensa/config/core-isa.h:580:#define XCHAL_HAVE_XLT_CACHEATTR  0   /* region prot. w/translation */
src/platform/haswell/include/arch/xtensa/config/core-isa-bdw.h:588:#define XCHAL_HAVE_XLT_CACHEATTR 0   /* region prot. w/translation */
src/platform/haswell/include/arch/xtensa/config/core-isa-hsw.h:588:#define XCHAL_HAVE_XLT_CACHEATTR 0   /* region prot. w/translation */
src/platform/icelake/include/arch/xtensa/config/core-isa.h:580:#define XCHAL_HAVE_XLT_CACHEATTR 0   /* region prot. w/translation */
src/platform/imx8/include/arch/xtensa/config/core-isa.h:614:#define XCHAL_HAVE_XLT_CACHEATTR    0   /* region prot. w/translation */
src/platform/imx8m/include/arch/xtensa/config/core-isa.h:612:#define XCHAL_HAVE_XLT_CACHEATTR   0   /* region prot. w/translation */
src/platform/imx8ulp/include/arch/xtensa/config/core-isa.h:645:#define XCHAL_HAVE_XLT_CACHEATTR 0   /* region prot. w/translation */
src/platform/meteorlake/include/arch/xtensa/config/core-isa.h:659:#define XCHAL_HAVE_XLT_CACHEATTR  0   /* region prot. w/translation */
src/platform/mt8186/include/arch/xtensa/config/core-isa.h:760:#define XCHAL_HAVE_XLT_CACHEATTR  0   /* region prot. w/translation */
src/platform/mt8195/include/arch/xtensa/config/core-isa.h:669:#define XCHAL_HAVE_XLT_CACHEATTR  0   /* region prot. w/translation */
src/platform/suecreek/include/arch/xtensa/config/core-isa.h:580:#define XCHAL_HAVE_XLT_CACHEATTR    0   /* region prot. w/translation */
src/platform/tigerlake/include/arch/xtensa/config/core-isa.h:603:#define XCHAL_HAVE_XLT_CACHEATTR   0   /* region prot. w/translation */
andyross commented 2 years ago

Right. Thus "not much of an option". :)

lyakh commented 2 years ago

that all sounds wonderful... But reading from those addresses does generate an exception, so the DSP "knows" that they are invalid? I looked at some of the registers and the lowest addresses I see begin soon after 0x1000 so about the low 4k of addresses seem to be invalid? So it's the bus hardware bug, that it raises exceptions on reading from there but not on writing to that area?

andyross commented 2 years ago

That matches my experience, though I never tried to probe the exact behavior. For sure I've gotten some reasonable-looking exceptions on null pointer dereferences. It makes perfect sense that the hardware designers would put a poison region at the bottom of memory. Frustrating if it only works for reads, though.

ceolin commented 2 years ago

that all sounds wonderful... But reading from those addresses does generate an exception, so the DSP "knows" that they are invalid? I looked at some of the registers and the lowest addresses I see begin soon after 0x1000 so about the low 4k of addresses seem to be invalid? So it's the bus hardware bug, that it raises exceptions on reading from there but not on writing to that area?

Did you were able to get the access mode for these address that are behaving like this ? There are access mode that raises exceptions in fetch but not in load and store operations.