Open marc-hb opened 3 months ago
@ceolin @kv2019i will want to be in on this too I suspect.
FWIW an awful lot of that stuff doesn't need to be managed automatically manually at all. The .text/.literal/.data offsets all come out cleanly from the linker, the stack is just a memory array, we automatically compute general heap boundaries with symbols marking the extents. Things like the "boot loader stack" should really be unified with the existing core 0 interrupt stack for efficiency. The memory windows can be sized as general buffers with an __aligned tag, etc... There's a ton of manual management here that is just historical and needless.
And the remaining areas that can't be automatic will end up corresponding to "addresses the CSME, boot ROM or host loader code knows about external to the firmware". And those probably belong in device tree or an evolved loader protocol.
Tentative Zephyr upgrade in sof/west.yml
with the temporary fix:
EDI: sorry for the noise this comment was meant for related https://github.com/thesofproject/sof/issues/9308
Assigning this to @lyakh for now, as I believe this would require some firmware loader works too for the addresses.
Describe the bug
This is a follow-up to https://github.com/thesofproject/sof/issues/9308 where the audio DSP firmware on MTL stopped booting across the board after a random combination of Pull Requests that all passed separately. After a long and tedious investigation by @tmleman and @lyakh , it was found that the IMR memory had been corrupted for a long time - we just never noticed it before.
The IMR (Isolated Memory Region) is an area of the main/host memory that stays up when the audio DSP (including its local memory) is off. It speeds up restarting the audio firmware. Starting with the "ACE" generation it is the standard way to restart (there is a
sof_debug=0x80
debug bit/option to boot from scratch instead). Among others, the IMR contains a complete image of the firmware code which happened to be partially overwritten.The reason it was partially overwritten is because the
adsp_memory.h
file (see https://github.com/zephyrproject-rtos/zephyr/commit/6069f946be1bd502) maps that IMR region with hardcoded constants that became disconnected with reality.The urgent and temporary fix submitted in #76196 re-hardcodes "better" values but that's obviously not sustainable. This GitHub issue is to discuss and track the longer term solution(s).
To Reproduce
Inspect the code size and notice that it is to big for the IMR area meant for it.
Originally posted by @tmleman in https://github.com/thesofproject/sof/issues/9308#issuecomment-2244736317
Expected behavior
At build time, a linker or elfutils script must 1. either compute some optimal allocation, or 2. at the very least check that areas are big enough for their intended purposes and fail when not big enough.
If possible, additional runtime checks/protections cannot hurt.
Impact
As usual with memory corruption: critical.
Failure to boot, security risks, etc.